Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COW in LeafNodes #314

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

COW in LeafNodes #314

wants to merge 1 commit into from

Conversation

jsign
Copy link
Collaborator

@jsign jsign commented Dec 13, 2022

This PR contains multiple performance improvements that will be explained in PR comments.

These changes were motivated by optimizing the benchmarks in this go-ethereum draft PR. It's recommended to see that PR first so you have a top-down approach to understanding the changes.

Of course, this had a noticeable impact on the benchmarks of this repo too:

name                                        old time/op    new time/op    delta
ProofCalculation-16                            1.87s ± 2%     0.08s ± 3%  -95.75%  (p=0.003 n=5+7)
ProofVerification-16                          26.9ms ± 2%    26.5ms ± 1%     ~     (p=0.171 n=5+8)
GroupToField/single-16                        1.53µs ± 2%    1.45µs ± 2%   -5.08%  (p=0.002 n=5+8)
GroupToField/multiple/1-16                    1.61µs ± 4%    1.57µs ± 3%   -2.94%  (p=0.048 n=5+7)
GroupToField/multiple/2-16                    1.83µs ± 4%    1.73µs ± 1%   -5.24%  (p=0.003 n=5+7)
GroupToField/multiple/4-16                    2.53µs ± 2%    2.38µs ± 3%   -5.88%  (p=0.002 n=5+8)
GroupToField/multiple/8-16                    3.96µs ± 4%    3.75µs ± 2%   -5.36%  (p=0.002 n=5+8)
GroupToField/multiple/16-16                   6.26µs ± 3%    6.01µs ± 1%   -4.10%  (p=0.003 n=5+8)
GroupToField/multiple/32-16                   11.2µs ± 4%    10.9µs ± 2%   -2.55%  (p=0.006 n=5+8)
GroupToField/multiple/64-16                   21.5µs ± 3%    20.9µs ± 2%   -2.99%  (p=0.006 n=5+8)
GroupToField/multiple/128-16                  40.2µs ± 2%    39.3µs ± 0%   -2.29%  (p=0.003 n=5+7)
GroupToField/multiple/256-16                  79.8µs ± 4%    77.8µs ± 1%   -2.54%  (p=0.010 n=5+7)
CommitLeaves/insert/leaves/1000-16            47.2ms ± 2%    11.9ms ± 3%  -74.71%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/1000-16     66.2ms ± 3%    66.5ms ± 4%     ~     (p=0.833 n=5+8)
CommitLeaves/insert/leaves/10000-16            471ms ± 2%      65ms ± 5%  -86.19%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/10000-16     626ms ± 1%     830ms ± 2%  +32.68%  (p=0.002 n=5+8)
CommitFullNode-16                             6.30ms ± 5%    5.12ms ± 2%  -18.66%  (p=0.003 n=5+7)
ModifyLeaves-16                                380ms ± 4%     169ms ± 3%  -55.40%  (p=0.002 n=5+8)

Note: this PR is optimistically sitting on top of another branch of go-ipa that has other optimizations. I'll update the go.mod to the official commit when that gets merged. Only after we do that, the replay block GH action will run correctly for go.mod editing reasons.

config_ipa.go Outdated Show resolved Hide resolved
encoding.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
proof_test.go Show resolved Hide resolved
tree.go Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
Copy link
Member

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. I'm mostly behind these changes, although I'd rather have a cow approach for the commitment calculation in LeafNode than what you did here with the nil commitments. I believe this will lead to even higher performance.

proof_test.go Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
encoding.go Outdated Show resolved Hide resolved
config_ipa.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
tree.go Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gballet, thanks for the review!

For completeness in this PR discussion, we've discussed a bit and decided to wait for #295 to be merged first. After that, I'll re-run the benchmarks, and we can plan on keep moving forward with this PR.

@gballet gballet force-pushed the master branch 8 times, most recently from 1d4d60c to a649f3c Compare January 9, 2023 09:44
@jsign jsign force-pushed the jsign/multiplefr branch 5 times, most recently from 0ecada1 to bf4de31 Compare January 11, 2023 20:06
@jsign jsign changed the title Multiple performance improvements COW in LeafNodes & batch Point->Frs & other improvements Jan 11, 2023
@jsign jsign force-pushed the jsign/multiplefr branch 3 times, most recently from 939ef88 to 3e8bae5 Compare January 12, 2023 14:22
Comment on lines +42 to +45
func TestMain(m *testing.M) {
_ = GetConfig()
os.Exit(m.Run())
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing GetConfig() in each test, we can make this part of the startup process before running the tests, so we don't have to repeat ourselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is necessary: once the first test is called, it will initialize the cfg variable for all subsequent tests. Not sure what you are trying to achieve here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding adding GetConfig() in each test.

You can't rely on only doing that in the first test of the file, because if you later do go test ./... -run=TestSomeParticularOne, that will panic if isn't the first one in the file. So that means you have to do GetConfig() on every test, which can be avoiding on doing this in TestMain().

Makes sense to you?

root := New()
root.Insert(ffx32KeyTest, zeroKeyTest, nil)
root.Commit()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite important now to always do Commit() after doing changes, since we're trying to be "as lazy as possible" to do premature work if values keep changing before being ready to commit.

tree.go Outdated
Comment on lines 41 to 45
// Committer represents an object that is able to create the
// commitment to a polynomial.
type Committer interface {
CommitToPoly([]Fr, int) *Point
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, this interface isn't used anymore. This isn't because changes of this PR. If we delete this interface in current master, all compiles and work the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, go-verkle used to support two schemes and one of them got dropped. I guess we can remove it, but in the interest of keeping a clean git history, I would appreciate if you could make this another PR.

@@ -187,6 +183,7 @@ type (

commitment *Point
c1, c2 *Point
cow map[byte][]byte
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is the "main" change of this second version of the PR.
The idea is that we do the "COW" idea in LeafNodes.

If we've never calculated a (*LeafNode).commitment, whenever we do (*LeafNode).Commit() we can do the same as we did before. Taking all the slice of values, and calculate the commitment.

But, if we already have a (*LeafNode).commitment != nil, then every time we touch that leaf node, we keep track in (*LeafNode).cow which was the previous value, and update (*LeafNode).values with the new value. Whenever the user calls Commit(...), the code will notice that we already had a previous (*LeafNode).commitment != nil and a (*LeafNode).cow with some tracked changes. Then, we do the diff-updating.

In summary, instead of doing diff-updating as soon as the user calls Delete(...) or Insert*(...) which previously did the diff-updating straight; we now only keep track in (*LeafNode).cow and only do the real diff-updating when Commit() is called.

So if the client is updating multiple times the same leaf value, we avoid doing diff-updatings that are overwritten. We only do it once when Commit() is called.

Also, this centralized the logic of *LeafNode commitment updating in a single place: Commit(). (i.e: we don't have commitment calculations spread in NewLeafNode(..), Insert*(...) , and Delete(...). There's a single place where that work happens.

This changes the previous version of this PR pattern if n.commitment == nil { n.Commit() } that we wanted to change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ^ is the complete picture of how COW works for leaf nodes. The idea of the comment is to give you the full picture. I'll dive into some extra details about where stuff happens.

tree.go Outdated
Comment on lines 472 to 474
// If we're at the root internal node, we fire goroutines exploiting
// available cores. Those goroutines will recursively take care of downstream
// layers, so we avoid creating too many goroutines.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this extra comment to explain this idea of n.depth == 0. Doesn't hurt to be a bit more explicit for future readers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is interesting, although I still need to think it over to convince myself that it will have an impact. I doubt that will be the case for InsertOrdered, for example, since by the time a depth of 0 is reached, most of the tree will have already been flushed.

It is an idea that, I think, would be great to implemen in FlushAtDepth, btw.

It is however very interesting when computing the root during block execution.

So all in all, I'm in favor of this, but just as before, I would appreciate if this was done in a separate PR, so that we can check its contribution independently, and also keep a somewhat cleaner git history.

Comment on lines -886 to 891
var c1, c2 *Point
var old1, old2 *Fr
for i, v := range values {
if len(v) != 0 && !bytes.Equal(v, n.values[i]) {
if i < 128 {
if c1 == nil {
c1, old1 = n.getOldCn(byte(i))
}
n.updateCn(byte(i), v, c1)
} else {
if c2 == nil {
c2, old2 = n.getOldCn(byte(i))
}
n.updateCn(byte(i), v, c2)
}

n.values[i] = v
for i := range values {
if values[i] != nil {
n.updateLeaf(byte(i), values[i])
}
}

if c1 != nil {
n.updateC(0, c1, old1)
}
if c2 != nil {
n.updateC(128, c2, old2)
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updateMultipleLeaves(...) got simplified, so this logic disappeared.
Updating multiple leaves is simply calling updateLeaf(...) explained before (i.e: tracking value changes).

},
}

func (leaf *LeafNode) Commit() *Point {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so here we have the new version of (*LeafNode).Commit().
Here we do what we explained in the TL;DR before, but I'll repeat a bit in further comments.

Comment on lines +1084 to +949
// If we've never calculated a commitment for this leaf node, we calculate the commitment
// in a single shot considering all the values.
if leaf.commitment == nil {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained in the comment I added, if this leaf node never was Commit(..)ed before, we do the known logic to commit to all the leaf.values in the polynomial commitment.

We could in theory only do diff-updating starting from a "empty" commitment and do diff-updates on top. But that's quite slow... if we have a bunch of values that are all new (since we have never Commited before), we do all the calculation in a single poly commitment. The code of this "if" block is the same as the previous version of this PR, so nothing interesting is needed for reviewing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could that not be simplified by simply setting the commitment to zero and then applying the CoW logic below to it? This should be equivalent, unless I missed something?

Copy link
Collaborator Author

@jsign jsign Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I went with that approach initially since it would avoid an "if" case here. Unfortunately, that was slower.

If you have a calculated commitment and 10 values to update with diffs-commits, you have to do 10 diff commitment which is slower than doing a single 10-values polynomial commitment.

In fact, at some point if your cow is showing that you changed 256 values, probably we shouldn't be doing diff-updating too. (Under the same logic) Might be useful to try to discover when diff-updating is slower than doing the full calculation again.

Comment on lines +1117 to +1006
} else if len(leaf.cow) != 0 {
// If we've already have a calculated commitment, and there're touched leaf values, we do a diff update.
var c1, c2 *Point
var old1, old2 *Fr
for i, oldValue := range leaf.cow {
if !bytes.Equal(oldValue, leaf.values[i]) {
if i < 128 {
if c1 == nil {
c1, old1 = leaf.getOldCn(i)
}
leaf.updateCn(i, oldValue, c1)
} else {
if c2 == nil {
c2, old2 = leaf.getOldCn(i)
}
leaf.updateCn(i, oldValue, c2)
}
}
}

if c1 != nil {
leaf.updateC(0, c1, old1)
}
if c2 != nil {
leaf.updateC(128, c2, old2)
}
leaf.cow = nil
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have the new case. In this case we know that leaf.commitment != nil (so we already did Commit(...) in this leaf node), and we see that there're values in leaf.cow which means that values where changed.

What we do here is something similar to what we did before for diff updating. The only "meaningful" change is that we send oldValue toupdateCn(...) instead of new values (as explained in a previous comment). This is only because in leaf.values we have the new values, and the old ones in leaf.cow.

In L1143 we cleanup the leaf.cow map to reclaim some memory, and allow that to be created again if new values in this leaf change again.

tree.go Outdated
Comment on lines 1347 to 1360
// The length of the result is at a minimum the sum of:
// 1 = leafRLPType
// 31 = length of n.stem
// 32 = length of bitlist
// 2*32 = c1bytes and c2bytes
// And we add 4*32 bytes of extra capacity for an ~expected 4 children.
result := make([]byte, 1+31+32+2*32, 1+31+32+2*32+4*32)
result[0] = leafRLPType
copy(result[1:], n.stem[:31])
copy(result[1+31:], bitlist[:])
copy(result[1+31+32:], c1bytes[:])
copy(result[1+31+32+32:], c2bytes[:])
result = append(result, children...)
return result, nil
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change that already existed in the previous version of this PR that avoids a lot of unnecessary capacity by the previous append(append(... pattern.

But, after rebasing to the latest master I had to fix it adding this extra c1bytes and c2bytes that are now part of the serialization (probably makes sense to you because you did that addition). The idea is the same, jus that some extra space is specifically allocated for that new stuff.

Also, in L1347-L1352 I added some comment to explain what's the idea here. I think without the comment was somewhat obvious that we're trying to allocate as close as possible to the space we need. But... it doesn't hurt to explain where const-numbers come from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, I also would like this to be a separate PR, though.

@jsign jsign marked this pull request as ready for review January 12, 2023 14:24
Copy link
Member

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that looks much better. The issue is that it doesn't seems to speed things up during the block replay... It makes no sense to me, but it's a fact.

profile047

I suggest breaking all the optimizations that are not cow-related into separate PRs, so that we can sift what works and what needs to be reworked. Good work in any case, sorry this isn't as rewarding as we hoped just yet.

tree.go Outdated
Comment on lines 41 to 45
// Committer represents an object that is able to create the
// commitment to a polynomial.
type Committer interface {
CommitToPoly([]Fr, int) *Point
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, go-verkle used to support two schemes and one of them got dropped. I guess we can remove it, but in the interest of keeping a clean git history, I would appreciate if you could make this another PR.

@@ -219,23 +216,6 @@ func NewLeafNode(stem []byte, values [][]byte) *LeafNode {
c2: Generator(),
}

// Initialize the commitment with the extension tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you remove the commitment calculation, shouldn't the cow field be initialized with values at this point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because commitment is nil in this case. In that case, the Commit() function already sees that as a signal that there's no previous commitment that we can do diff-updating; so it will do the usual full computation in the vector.

That means we don't need to initialize cow here. cow is initialized whenever we have a previous commitment calculation, and any value of the leaf is changed. In that case, we track in cow since will do a diff-update in the next Commit().

tree.go Outdated
Comment on lines 472 to 474
// If we're at the root internal node, we fire goroutines exploiting
// available cores. Those goroutines will recursively take care of downstream
// layers, so we avoid creating too many goroutines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is interesting, although I still need to think it over to convince myself that it will have an impact. I doubt that will be the case for InsertOrdered, for example, since by the time a depth of 0 is reached, most of the tree will have already been flushed.

It is an idea that, I think, would be great to implemen in FlushAtDepth, btw.

It is however very interesting when computing the root during block execution.

So all in all, I'm in favor of this, but just as before, I would appreciate if this was done in a separate PR, so that we can check its contribution independently, and also keep a somewhat cleaner git history.

Comment on lines +1084 to +949
// If we've never calculated a commitment for this leaf node, we calculate the commitment
// in a single shot considering all the values.
if leaf.commitment == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could that not be simplified by simply setting the commitment to zero and then applying the CoW logic below to it? This should be equivalent, unless I missed something?

tree.go Outdated
Comment on lines 1347 to 1360
// The length of the result is at a minimum the sum of:
// 1 = leafRLPType
// 31 = length of n.stem
// 32 = length of bitlist
// 2*32 = c1bytes and c2bytes
// And we add 4*32 bytes of extra capacity for an ~expected 4 children.
result := make([]byte, 1+31+32+2*32, 1+31+32+2*32+4*32)
result[0] = leafRLPType
copy(result[1:], n.stem[:31])
copy(result[1+31:], bitlist[:])
copy(result[1+31+32:], c1bytes[:])
copy(result[1+31+32+32:], c2bytes[:])
result = append(result, children...)
return result, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, I also would like this to be a separate PR, though.

Comment on lines +42 to +45
func TestMain(m *testing.M) {
_ = GetConfig()
os.Exit(m.Run())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is necessary: once the first test is called, it will initialize the cfg variable for all subsequent tests. Not sure what you are trying to achieve here?

@jsign
Copy link
Collaborator Author

jsign commented Jan 13, 2023

I suggest breaking all the optimizations that are not cow-related into separate PRs, so that we can sift what works and what needs to be reworked. Good work in any case, sorry this isn't as rewarding as we hoped just yet.

Sounds good to me. I can break up changes in different PRs, and probably that will help.

Signed-off-by: Ignacio Hagopian <[email protected]>
@jsign jsign changed the title COW in LeafNodes & batch Point->Frs & other improvements COW in LeafNodes Jan 19, 2023
@jsign
Copy link
Collaborator Author

jsign commented Mar 1, 2023

@gballet, I still want to reconsider running this idea again after some other performance improvements we'll include soon.

Moving to draft to just avoid noise in the review PR order; moving to "paused mode" here for some weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants