Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Finalize target block after warp sync #2696

Merged
merged 7 commits into from
Jun 7, 2023

Conversation

davxy
Copy link
Member

@davxy davxy commented Jun 6, 2023

TL;DR

This PR addresses the following issues:

  1. Level-monitor restore procedure after warp sync and if node is restarted during block-history download
  2. Finalization of target block after warp sync
  3. Proper handling of finalized blocks during block history download
  4. prevent insertion of finalized blocks in level-monitor (check if block > final)

Fixes: #2659

May help with: paritytech/substrate#13202

Depends on paritytech/substrate#14308


Warping without the fix

Finalized block number is not updated until we've not finished block history download.

This is a source of issues in case we stop and restart the node while we are downloading block history.

Issues:

  1. Level monitor panics during restore procedure: we are trying to compute a route from last finalized (that is still 0) to each of the leaves. Because the blocks in the gap were not downloaded yet, this triggers a panic.
  2. If we don't stop the node during block-history download then download/import of the gap can be memory-expensive since each of the blocks imported (actually the hash) will be inserted within the monitor (number > finalized = 0)
    The monitor will end up containing the full block history from gap start to the leaves.

At the end of the block-history download finalized flag is finally set to the proper value, thus the monitor is cleaned up.

We can do a lot better and mimic what the relay chain actually does (see "Warping with fix below").

Log before fix (statemine)

Warping, Downloading state, 2.34 Mib (9 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),343.0kiB/s ⬆ 22.5kiB/s
⏩ Warping, Downloading state, 7.18 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),625.9kiB/s ⬆ 2.7kiB/s
⏩ Warping, Downloading state, 11.87 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),842.7kiB/s ⬆ 10.1kiB/s
⏩ Warping, Downloading state, 16.73 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),399.1kiB/s ⬆ 1.3kiB/s
⏩ Warping, Downloading state, 23.15 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),576.1kiB/s ⬆ 0.2kiB/s
⏩ Warping, Downloading state, 31.52 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),788.7kiB/s ⬆ 0.7kiB/s

⏩ Warping, Importing state, 34.97 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),337.7kiB/s ⬆ 1.3kiB/s
⏩ Warping, Importing state, 34.97 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),0.2kiB/s ⬆ 0.2kiB/s
⏩ Warping, Importing state, 34.97 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),0.7kiB/s ⬆ 0.7kiB/s
⏩ Warping, Importing state, 34.97 Mib (10 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),1.3kiB/s ⬆ 1.1kiB/s

Warp sync is complete (34 MiB), restarting block sync.Block history, #3009 (2 peers), best: #4642329 (0x54672813), finalized #0 (0x4823771a),385.4kiB/s ⬆ 2.6kiB/s
⏩ Block history, #3265 (2 peers), best: #4642329 (0x54672813), finalized #0 (0x4823771a),193.8kiB/s ⬆ 0.6kiB/s
⏩ Block history, #7425 (2 peers), best: #4642329 (0x54672813), finalized #0 (0x4823771a),741.0kiB/s ⬆ 1.1kiB/s
...
⏩ Block history, #4639041 (23 peers), best: #4643003 (0x3d91982f), finalized #0 (0x4823771a),5.7MiB/s ⬆ 1.3kiB/s
⏩ Block history, #4639681 (23 peers), best: #4643003 (0x3d91982f), finalized #0 (0x4823771a),1.6MiB/s ⬆ 0.3kiB/s
⏩ Block history, #4639681 (23 peers), best: #4643004 (0x4b70…d135), finalized #0 (0x4823771a),157.3kiB/s ⬆ 2.1kiB/s
⏩ Block history, #4639873 (23 peers), best: #4643004 (0x4b70…d135), finalized #0 (0x4823771a),665.1kiB/s ⬆ 0.2kiB/s
⏩ Block history, #4642305 (22 peers), best: #4643005 (0x2c42…bca4), finalized #0 (0x4823771a),1.5MiB/s ⬆ 10.6kiB/s

Block history download is complete.

💤 Idle (23 peers), best: #4643064 (0xdf892480), finalized #4643014 (0xba26234a),313.4kiB/s ⬆ 5.0kiB/s
💤 Idle (23 peers), best: #4643064 (0xdf892480), finalized #4643062 (0xccf9323a),64 B/s ⬆ 64 B/s
💤 Idle (22 peers), best: #4643065 (0x498a6f9c), finalized #4643063 (0x13868998),47.5kiB/s ⬆ 27.6kiB/s
...

Warping with the fix

Once we discovered what is the last target block (according to the relay-chain) we should import this block as final (i.e. setting the BlockImportParams::finalized flag to true).

The adopted strategy mimics what grandpa already does here and thus here.

That is, the finalized flag is set to true if the block also contains the state.
AFAIK in practice this is true only for the warp sync target block.

Log after the fix (statemine):

Warping, Downloading state, 7.18 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),962.2kiB/s ⬆ 22.7kiB/s
⏩ Warping, Downloading state, 9.59 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),320.5kiB/s ⬆ 0.3kiB/s
⏩ Warping, Downloading state, 23.15 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),1.4MiB/s ⬆ 1.8kiB/s
⏩ Warping, Downloading state, 31.52 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),821.4kiB/s ⬆ 1.4kiB/s
⏩ Warping, Downloading state, 31.52 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),185.7kiB/s ⬆ 0.1kiB/s

⏩ Warping, Importing state, 34.97 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),128.3kiB/s ⬆ 0.2kiB/s
⏩ Warping, Importing state, 34.97 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),1.1kiB/s ⬆ 1.1kiB/s
⏩ Warping, Importing state, 34.97 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),0.2kiB/s ⬆ 0.2kiB/s
⏩ Warping, Importing state, 34.97 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),0.2kiB/s ⬆ 0.2kiB/s
⏩ Warping, Importing state, 34.97 Mib (8 peers), best: #0 (0x4823771a), finalized #0 (0x4823771a),1.0kiB/s ⬆ 1.1kiB/s

Warp sync is complete (34 MiB), restarting block sync.Block history, #2112 (7 peers), best: #4641676 (0x2e24…a3db), finalized #4641676 (0x2e24…a3db),669.7kiB/s ⬆ 1.2kiB/s    
⏩ Block history, #2112 (1 peers), best: #4641678 (0xfad23364), finalized #4641678 (0xfad23364),53.7kiB/s ⬆ 2.2kiB/s    
⏩ Block history, #2176 (2 peers), best: #4641683 (0x46928354), finalized #4641681 (0x1a3d52fa),95.0kiB/s ⬆ 13.9kiB/s    
...
⏩ Block history, #4635712 (17 peers), best: #4642215 (0x60de…a865), finalized #4642214 (0x3d8e…eebe),4.5MiB/s ⬆ 0.7kiB/s
⏩ Block history, #4638848 (17 peers), best: #4642216 (0x770b69bc), finalized #4642214 (0x3d8e…eebe),7.4MiB/s ⬆ 1.5kiB/s
⏩ Block history, #4641675 (17 peers), best: #4642216 (0x770b69bc), finalized #4642214 (0x3d8e…eebe),6.2MiB/s ⬆ 22.7kiB/s

Block history download is complete.

💤 Idle (17 peers), best: #4642216 (0x770b69bc), finalized #4642215 (0x60de…a865),3.3kiB/s ⬆ 1.9kiB/s
💤 Idle (17 peers), best: #4642217 (0xf9959fec), finalized #4642215 (0x60de…a865),1.6kiB/s ⬆ 1.5kiB/s
💤 Idle (17 peers), best: #4642218 (0x80448b93), finalized #4642216 (0x770b69bc),56 B/s ⬆ 51 B/s
...

As can be seen the finalized block correctly starts being updated after state is imported, as expected.


Additional notes

Cumulus node caches the parachain blocks that are signaled as final here
and triggers a Finalized::finalize_block() for each block found in the cache.

This procedure marks the block as final BUT doesn't check if the block that we are finalizing is below the current "best" block.

As a consequence may happen that the finalized block number > best.
We don't want this as it also by-passes some general assumptions spread around the code.

To correctly handle cases like this a more defensive check been introduced in Substrate by paritytech/substrate#14308

@davxy davxy self-assigned this Jun 6, 2023
@davxy davxy added B0-silent Changes should not be mentioned in any release notes A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Jun 6, 2023
@davxy davxy requested review from bkchr, andresilva, koute, michalkucharczyk, skunert and a team June 6, 2023 13:55
client/consensus/common/src/lib.rs Show resolved Hide resolved
@davxy davxy merged commit 31b1469 into master Jun 7, 2023
@davxy davxy deleted the davxy-update-finalized-during-history-download branch June 7, 2023 11:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Level Monitor Panics when restarting with warp sync
3 participants