Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use skiplists to save on the bitmap size when compressing leaves #454

Draft
wants to merge 8 commits into
base: jsign-type-3
Choose a base branch
from

Conversation

gballet
Copy link
Member

@gballet gballet commented Sep 11, 2024

Instead of using a bitmap, use two bytes:

  • the first byte will indicate the length of the range of empty leaves
  • the second byte will indicate the length of the next range of non-empty leaves

There are ~200M accounts in the state, for which we use at most 6 bytes instead of 32 for a bitmap. This is expected to save (32 - 6) * 200 ~ 5GB

There are further optimizations that will not be handled by this PR, but that I'm writing down here so that I don't forget:

  • left-trim leaves
  • only store the C1/C2 if necessary (I need to find a bit to flag which ones are not present)

This PR also contains some fixes for the EoA encoding, that got broken after the Nyota costs.

@gballet gballet requested a review from jsign September 11, 2024 10:58
@@ -170,6 +170,7 @@ const (
leafType byte = 2
eoAccountType byte = 3
singleSlotType byte = 4
skipListType byte = 8
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is set in bit order, because I'm hoping to mix encodings in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eoAccountType has value 3 which isn't respecting that idea. Is it worth changing now? If not, we're ina a half-baked idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could change it to 4, no long-term db has been using this.

// Code hash should be the empty code hash
isEoA = v != nil && bytes.Equal(v, EmptyCodeHash[:])
case 4:
// Code size must be 0
isEoA = v != nil && bytes.Equal(v, zero32[:])
default:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the EoA fix for Nyota.

proof_test.go Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the changes in this PR are due to me building on top of two of your commits, which I need for my replay experiments.

k[StemSize] = ins.Suffix
keys = append(keys, k[:])
prevalues = append(prevalues, nil)
postvalues = append(postvalues, slices.Clone(ins.New[:]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and line 410, are the fixes for the test that was broken.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, so this means that stemdiff is somewhat mutated after affecting the returned Proof?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the area that is used to store the location of the iterator is reused over and over, and if we pass a reference to it, it will change as the iterator progress. This is not the first time we fall for this, and probably not the last time either, it's quite subtle and easy to forget.

@gballet gballet added this to the verkle-gen-devnet-8 milestone Sep 11, 2024
@gballet gballet changed the base branch from master to jsign-type-3 September 11, 2024 16:29
Copy link
Collaborator

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some comments for your consideration.

k[StemSize] = ins.Suffix
keys = append(keys, k[:])
prevalues = append(prevalues, nil)
postvalues = append(postvalues, slices.Clone(ins.New[:]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, so this means that stemdiff is somewhat mutated after affecting the returned Proof?

@@ -170,6 +170,7 @@ const (
leafType byte = 2
eoAccountType byte = 3
singleSlotType byte = 4
skipListType byte = 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eoAccountType has value 3 which isn't respecting that idea. Is it worth changing now? If not, we're ina a half-baked idea.

Comment on lines +1851 to +1854
for _, gap := range gaps {
if gap.Count == 0 {
break // skip the last gap as nothing follows
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do:

for j:=0; j<=gapCount; j++ {

and avoid if L1852 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not completely, because if Count == 0 it means that the group ends with a gap, wherease if Count != 0 then the group ends with a range. But yeah, no point in going over the whole list, I'll loop on j.

encoding.go Outdated
Comment on lines 144 to 145
rangecount := serialized[offset+1]
gapsize := serialized[offset]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename gapsize and valueIdx to rangeSkip and rangeCount, as to use the same names in the serialization Skip and Count?

Using gap or range worlds can be confusing. Using the same names as the serialization makes easier to understand the both ways of the algorithm.

tree.go Outdated Show resolved Hide resolved
Comment on lines +1856 to +1857
leafIdx += int(gap.Skip)
result = append(result, byte(gap.Count))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking there can't be a 0-length range, so we should encode Count-1 instead of forcing the wraparound, which might not work on every platform.

Suggested change
leafIdx += int(gap.Skip)
result = append(result, byte(gap.Count))
leafIdx += int(gap.Skip)
// count can be 256, bytes(256) == 0 which is used as a marker for a full leaf
result = append(result, byte(gap.Count))

Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
@@ -56,6 +56,9 @@ const (
leafC1CommitmentOffset = leafCommitmentOffset + banderwagon.UncompressedSize
leafC2CommitmentOffset = leafC1CommitmentOffset + banderwagon.UncompressedSize
leafChildrenOffset = leafC2CommitmentOffset + banderwagon.UncompressedSize
leafSkipListCOffset = nodeTypeSize + StemSize
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: leafSkipListC0Offset

values[valueIdx] = serialized[offset : offset+leafSlotSize]

// shortcut: the leaf is full and so both values are 0.
if serialized[offset] == 0 && serialized[offset+1] == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this worth it? If the node is full, doesn't sound we should serialize with skip list really. Sounds wasteful.

Signed-off-by: Guillaume Ballet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants