-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7582: Add naive RLP caching for BlockHeader, Transaction, and Withdrawal #7988
base: main
Are you sure you want to change the base?
7582: Add naive RLP caching for BlockHeader, Transaction, and Withdrawal #7988
Conversation
Signed-off-by: Matilda Clerke <[email protected]>
…ng-during-sync # Conflicts: # ethereum/core/src/main/java/org/hyperledger/besu/ethereum/core/BlockHeader.java
Signed-off-by: Matilda Clerke <[email protected]>
I see the potential benefit of the caching, but since this is a tradeoff between compute and memory, have you done benchmarks to quantify the performance benefit of the change, and the possible impact on memory? |
Fair point. I modified the BlockHeadersMessageTest locally to produce 10000 headers and timed the looped writeTo calls in BlockHeadersMessage. The original code performed the 10000 writeTo calls in 18 to 23ms, while the updated code performed the 10000 writeTo calls in 13 to 18ms. In terms of memory, it's actually just using indexes to the original underlying array, so it should only be a small amount of extra memory used. |
So the micro benchmark test confirm that there is a gain on the compute side, and that is worth exploring more, for that I suggest to run some real sync on mainnet with this change and compare the results against some control instances. |
Agreed, but those references will keep strong links to the underlying byte arrays preventing it from garbage collection. I would like to have a rationale on what this feature is going to improve. It is an interesting improvement but is there is real use case for it ? and if so, we need to evaluate, how much is the latency improvement for how much overhead in terms of memory ? |
On the implementation side, have you thought about different approaches to the caching? |
From the micro benchmark, we'd expect a time saving of around 10 seconds for 10 million block headers. Realistically, not a noticeable gain on our current 20+ hour sync. However, it should give us a small saving of CPU utilisation. Memory breakdown is as follows: Regarding garbage collection: If the headers are currently getting garbage collected (I'd expect they are, but not sure), these references won't prevent the underlying array from being garbage collected. For this PR, I was looking just for some potential quick wins while continuing to focus mainly on my refactoring of peer tasks. @pinges is working on a more compilcated scheme which manages to avoid decoding the RLP for large portions of the block body in addition to avoiding re-encoding as we're doing here. |
Signed-off-by: Matilda-Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Backed out the withdrawal changes as we discussed and don't believe the changes will ever be used. |
Consider using JOL (Java Object Layout) or a heap dump to have more accurate numbers. With CompressedOOPs enabled, each Java object has a header of 12 Bytes if the JVM heap < 32 GiB or 16 Bytes if JVM heap >= 32 GiB. Depending on the total number of bytes for each object, there is internal and external padding to make the size a multiple of 8.
I think we should focus on the RLP raw data, the underlying byte array, not the headers. The current calculation focuses more on the shallow size because it is deterministic, and takes into account only the headers and the references. With the RLP stored in the transaction, we need to evaluate the impact on the transactionPool in terms of memory usage and garbage collection. |
In this screenshot below, we can see that the block header is having a reference to a ~2 MBytes byte array (rlp) which I guess is the RLP of the block, but that would be a big block, as we can see the offset and the length inside that AbstractWrappingBytes.
I guess all the transactions of that specific block, and the header are referencing the same underlying byte array. Now, we need to evaluate the real memory overhead. To do that, we need to have the number of live blocks in memory during sync and the average of the size of the RLP. You can share a heap dump during sync with your PR. I will extract the numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suggestion to init RLP during decoding the transaction from the RLP. I think overall, there is no much overhead when I look the heap dump as the underlying byte array is already referenced by the payload of the transaction, and to address.
return Transaction.builder() | ||
.copiedFrom(transaction) | ||
.rawRlp(Optional.of(transactionRlp.raw())) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling again a builder to add RLP, I would do it directly in each transaction decoder just after the payload, as they're very similar fields :
Line 52 in 2909ea4
.payload(input.readBytes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the transaction level, as the payload is already referencing the same underlying byte array, the only overhead is the reference of the rlp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern in applying this to the transaction encoder/decoder classes is that they seem to only populate a subset of the Transaction fields, so if we try to instead supply in the encoder the full original RLP, it may be significantly larger than it is currently.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be no difference, as rawRlp and payload are just references to the same byte array with an offset and a length. It doesn't change the size of the transaction object. With copiedFrom, we lose the first reference of the transaction as we're creating another one.
The question I didn't investigate is wether we have the (whole) rawRlp of the transaction at this level.
Signed-off-by: Matilda-Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
It seems these latest changes aren't quite working right. In particular, a block encoded and written to blockchainStorage is causing RLP errors when being read and decoded. I'm parking this briefly for now to progress some other issues. |
and handle empty lists in AbstractRLPInput Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
…7582-avoid-unnecessary-rlp-encoding-during-sync
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
…7582-avoid-unnecessary-rlp-encoding-during-sync
…r classes Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda-Clerke <[email protected]>
…an exception in test Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
PR description
Add naive RLP caching for BlockHeader, Transaction, and Withdrawal. This PR currently ignores Block, BlockBody, and TransactionReceipt due to complexity.
Issue
#7582