Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] opt: add branch builder push_chunk #568

Closed

Conversation

gabriele-0201
Copy link
Contributor

@gabriele-0201 gabriele-0201 commented Nov 18, 2024

I'm not a fan of having introduced RawSeparators, which include the old RawSeparator, but having both seems very redundant.
If you have other ideas, they are welcome!

Copy link
Contributor Author

gabriele-0201 commented Nov 18, 2024

@gabriele-0201 gabriele-0201 force-pushed the gm_branch_builder_push_chunk_opt branch 2 times, most recently from 3c1f04d to f049b80 Compare November 18, 2024 08:18
@gabriele-0201 gabriele-0201 force-pushed the gm_branch_builder_push_chunk_opt branch from f049b80 to 9e68d06 Compare November 19, 2024 03:42
@gabriele-0201
Copy link
Contributor Author

During the optimization of the branch_stage, I have just found out that it will likely be easier and more accurate to avoid having chunks intersect with non-compressed items. There are multiple reasons for this, and the decision is still pending, heavily dependent on the code I'm working on to introduce binary_search into the branch_stage. Therefore, I will move back this PR to WIP

@gabriele-0201 gabriele-0201 changed the title opt: add branch builder push_chunk [WIP] opt: add branch builder push_chunk Nov 19, 2024
@gabriele-0201 gabriele-0201 force-pushed the gm_branch_builder_push_chunk_opt branch from 9e68d06 to 3941ad8 Compare November 19, 2024 08:51
@@ -107,10 +114,31 @@ impl BranchNode {
self.as_mut_slice()[start..].view_bits_mut()[..prefix_len].copy_from_bitslice(prefix);
}

pub fn raw_separator(&self, i: usize) -> RawSeparator {
fn cells_mut(&self) -> &mut [[u8; 2]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be &mut

// and the relative bit length
//
// The raw bytes are always a multiple of 8 bytes in length
pub type RawSeparator<'a> = (&'a [u8], usize, usize);
pub type RawSeparators<'a> = (&'a [u8], usize, usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's odd to define this type as a plural but then use it like a singular in e.g. fn raw_separator or fn reconstruct_key

@@ -294,11 +387,305 @@ impl BranchNodeBuilder {
self.index += 1;
}

pub fn push_chunk(&mut self, base: &BranchNode, from: usize, to: usize) {
Copy link
Contributor

@rphmeier rphmeier Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this function, it looks to me like it internalizes too much complexity, especially around handling edge cases.

the contract of this function is not clear. the typical case I expect we would optimize for is basically just a single bitwise memcpy of the separators and a normal memcpy of the node pointers, which is what we get when the prefix hasn't changed.

I don't see a strong case that this would be more efficient than calling push repeatedly in the edge cases, and we've also pessimized the base case with a bunch of branches and additional logic.

It seems better to have alternative variants for each expected code path.

My suggestion:

  • Have a single push_chunk variant which assumes that the prefix has not changed. This would be used in our 'base case' as well as any uncompressed separators.
  • Call push in a loop, otherwise.

My gut feeling about optimization here is that we should be mostly memory-constrained, so it makes more sense to keep the code simple and have tight loops and externalize our branches as much as possible.

@rphmeier rphmeier changed the base branch from gm_bitwise_memcpy to graphite-base/568 November 22, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants