-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compress the State Response Message in State Sync #112
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# RFC-0112: Compress the State Response Message in State Sync | ||
|
||
| | | | ||
| --------------- | ------------------------------------------------------------------------------------------- | | ||
| **Start Date** | 14 August 2024 | | ||
| **Description** | Compress the state response message to reduce the data transfer during the state syncing | | ||
| **Authors** | Liu-Cheng Xu | | ||
|
||
## Summary | ||
|
||
This RFC proposes compressing the state response message during the state syncing process to reduce the amount of data transferred. | ||
|
||
## Motivation | ||
|
||
State syncing can require downloading several gigabytes of data, particularly for blockchains with large state sizes, such as Astar, which | ||
has a state size exceeding 5 GiB (https://github.com/AstarNetwork/Astar/issues/1110). This presents a significant | ||
challenge for nodes with slower network connections. Additionally, the current state sync implementation lacks a persistence feature (https://github.com/paritytech/polkadot-sdk/issues/4), | ||
meaning any network disruption forces the node to re-download the entire state, making the process even more difficult. | ||
|
||
## Stakeholders | ||
|
||
This RFC benefits all projects utilizing the Substrate framework, specifically in improving the efficiency of state syncing. | ||
|
||
- Node Operators. | ||
- Substrate Users. | ||
|
||
## Explanation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this requires some implementation details, like what compression algorithm is going to be used There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As per the code changes link in L42, We'll use |
||
|
||
The largest portion of the state response message consists of either `CompactProof` or `Vec<KeyValueStateEntry>`, depending on whether a proof is requested ([source](https://github.com/paritytech/polkadot-sdk/blob/0cd577ba1c4995500eb3ed10330d93402177a53b/substrate/client/network/sync/src/state_request_handler.rs#L216-L241)): | ||
|
||
- `CompactProof`: When proof is requested, compression yields a lower ratio but remains beneficial, as shown in warp sync tests in the Performance section below. | ||
- `Vec<KeyValueStateEntry>`: Without proof, this is theoretically compressible because the entries are generated by iterating the | ||
storage sequentially starting from an empty storage key, which means many entries in the message share the same storage prefix, making it ideal | ||
for compression. | ||
|
||
## Drawbacks | ||
|
||
None identified. | ||
|
||
## Testing, Security, and Privacy | ||
|
||
The [code changes](https://github.com/liuchengxu/polkadot-sdk/commit/2556fefacd2e817111d838af5f46d3dfa495852d) required for this RFC are straightforward: compress the state response on the sender side and decompress it on the receiver side. Existing sync tests should ensure functionality remains intact. | ||
|
||
## Performance, Ergonomics, and Compatibility | ||
|
||
### Performance | ||
|
||
This RFC optimizes network bandwidth usage during state syncing, particularly for blockchains with gigabyte-sized states, while introducing negligible CPU overhead for compression and decompression. For example, compressing the state response during a recent Polkadot warp sync (around height #22076653) reduces the data transferred from 530,310,121 bytes to 352,583,455 bytes — a 33% reduction, saving approximately 169 MiB of data. | ||
|
||
Performance data is based on [this patch](https://github.com/liuchengxu/polkadot-sdk/commit/da93360c9a59c29409061789c598d8f4e55d7856), with logs available [here](https://github.com/liuchengxu/polkadot-sdk/commit/9d98cefd5fac0a001d5910f7870ead05ab99eeba). | ||
|
||
### Ergonomics | ||
|
||
None. | ||
|
||
### Compatibility | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks to me that this needs some network protocol versioning. For example what happens if the receiver is not expecting compressed data? Or if the receiver is expecting compressed data but gets uncompressed data? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a valid concern, and I agree that network versioning would be the ideal solution. However, it seems that Substrate doesn't currently support network protocol versioning, which might deserve another RFC. In the meantime, we can introduce a backward-compatible workaround. For example, we could define a new
The main consequence of this approach is that NewNodes won't be able to sync state from OldNodes, but they can still sync with other NewNodes. This workaround isn't perfect, but it ensures some level of compatibility until a more comprehensive network versioning solution can be implemented in Substrate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added some support for req/response protocol versioning in paritytech/polkadot-sdk#2771 and used it for the implementation of #47. Although it's not optimal. It would first try a request on the new version and if that's not supported try on the old protocol (so making 2 round trips if the remote does not support the new version). It was implemented this way mostly due to libp2p not having the most flexible interfaces. In any case, the RFC should clearly state the interactions between new/old nodes CC: @lexnv There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bkchr Any more comments apart from the interactions between the old and new nodes? If the workaround I proposed in my last comment is fine, I can update the RFC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This optimization makes sense from my perspective. We are trading off bandwidth for CPU cycles in hopes that this will yield an overall faster warp sync. It would be interesting to see the comparison with times as well.
Indeed we'd need to think about versioning as well. We'd like to deprecate the older request-response protocols, so its worth adding as few breaking changes as possible and having a deprecation plan in mind:
Offhand, it looks like a similar approach to paritytech/polkadot-sdk#2771 should solve this issue. Then if we are to introduce a new request-response version, should we also consider other compression algorithms? I think this complicates things, but would be interesting to see some comparison between different compression algorithms and polkadot specific data (ie maybe we end up saving 20% bandwidth instead of 33% but we are faster to compress) |
||
|
||
No compatibility issues identified. | ||
|
||
## Prior Art and References | ||
|
||
None. | ||
|
||
## Unresolved Questions | ||
|
||
None. | ||
|
||
## Future Directions and Related Material | ||
|
||
None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the motivation to send less data or to improve state sync performance? Based on what you're saying, I think that implementing a persistence feature would be enough to solve this issue on slower network connections, as it allows you to request subsequent portions of the state incrementally.
Perhaps a better way to solve it is a combination of a persistence layer to store the state you are retrieving and to be able to stop/restart your node, and perhaps changing the request to ask for a certain amount of state.
Currently, the state request is like this
What I'm suggesting is to add a new field with a limit:
I know this could be configured as a different proposal, but I think it could be a better way to solve what you are saying without relying on compression algorithms. In the worst case, you could use a combination of both, asking for a partial state + compressing the response.
The trade-off might be that the number of requests could increase. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback. There is already a persistent state sync tracking issue paritytech/polkadot-sdk#4 that I'm actively working on. I initiated this RFC because this is a low-hanging fruit to reduce the bandwidth that can lead to immediate improvements in state sync performance, while persistent state sync requires much more effort and complexity (may take too long to wrap it up). As you can see from the patch, a few change lines will significantly improve the bandwidth consumption. The proposed changes in the RFC are minimal yet impactful in terms of bandwidth savings.
In contrast, your suggestion to add a
max_keys
field, while valid, does not contribute to reducing any bandwidth. It may even lead to increased request frequency, which could negate some of the performance gains we seek.