Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0: replay: do not start leader for a block we already have shreds for (backport of #2416) #2484

Merged
merged 2 commits into from
Aug 15, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Aug 8, 2024

Problem

See solana-labs#32679

In certain scenarios where the first leader block is not produced, however the second (or later) leader block is produced we can end up reproducing this block after resetting to a previous block.

Summary of Changes

When poh_recorder checks for leader slot, additionally check blockstore to see if shreds have already been inserted.


This is an automatic backport of pull request #2416 done by [Mergify](https://mergify.com).

…2416)

* replay: do not start leader for a block we already have shreds for

* pr feedback: comment, move existing check to blockstore fn

* move blockstore read after tick height check

* pr feedback: resuse blockstore fn in next_leader_slot

(cherry picked from commit 15dbe7f)

# Conflicts:
#	poh/src/poh_recorder.rs
@mergify mergify bot requested a review from a team as a code owner August 8, 2024 02:47
@mergify mergify bot added the conflicts label Aug 8, 2024
Copy link
Author

mergify bot commented Aug 8, 2024

Cherry-pick of 15dbe7f has failed:

On branch mergify/bp/v2.0/pr-2416
Your branch is up to date with 'origin/v2.0'.

You are currently cherry-picking commit 15dbe7fb0f.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   core/src/replay_stage.rs
	modified:   ledger/src/blockstore.rs
	modified:   ledger/src/leader_schedule_cache.rs

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   poh/src/poh_recorder.rs

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@@ -569,6 +569,15 @@ impl PohRecorder {
self.leader_first_tick_height_including_grace_ticks
{
if self.reached_leader_tick(my_pubkey, leader_first_tick_height_including_grace_ticks) {
if self.blockstore.has_existing_shreds_for_slot(next_poh_slot) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only functional change.
This fixes a long standing bug where in certain rare scenarios we will attempt to produce a block that we have already produced.

This does not transmit a duplicate block, as a failsafe in broadcast_stage will prevent us from sending the shreds out, however it results in wasted effort in banking_stage as the block is reproduced.

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -3964,6 +3964,13 @@ impl Blockstore {
Ok(duplicate_slots_iterator.map(|(slot, _)| slot))
}

pub fn has_existing_shreds_for_slot(&self, slot: Slot) -> bool {
match self.meta(slot).unwrap() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unwrap here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is just replicating the current unwrap behavior. Looks like this fails if we:

  1. hit a rocksdb error
  2. fail deserialization

Neither of these should happen in "normal" operation. Expect would be better.

@AshwinSekar AshwinSekar merged commit fd3b4f3 into v2.0 Aug 15, 2024
39 checks passed
@AshwinSekar AshwinSekar deleted the mergify/bp/v2.0/pr-2416 branch August 15, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants