Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bitcoin + stacks blockchain sync on startup #1087

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

cylewitruk
Copy link
Member

Description

Closes: #685

Changes

Will update, draft for now

Testing Information

Checklist:

  • I have performed a self-review of my code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@cylewitruk cylewitruk added the sbtc signer binary The sBTC Bootstrap Signer. label Dec 10, 2024
@cylewitruk cylewitruk added this to the sBTC 0.9, mainnet release milestone Dec 10, 2024
@cylewitruk cylewitruk self-assigned this Dec 10, 2024
@cylewitruk cylewitruk requested review from matteojug and djordon and removed request for matteojug December 10, 2024 10:41
/// When using the postgres storage, we need to make sure that this
/// function is called after the `Self::write_bitcoin_block` function
/// because of the foreign key constraints.
pub async fn extract_sbtc_transactions(
Copy link
Member Author

@cylewitruk cylewitruk Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these are just moved to top-level fn's in the file so they can be used elsewhere without needing to instantiate a block observer instance. Maybe they should even be in a separate file.

@cylewitruk cylewitruk requested a review from matteojug December 10, 2024 11:05
@cylewitruk cylewitruk marked this pull request as ready for review December 10, 2024 15:16
let stacks_client = self.context.get_stacks_client();
let tenure_info = stacks_client.get_tenure_info().await?;

// While we do do this at startup to do the "heavy lifting" of syncing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// While we do do this at startup to do the "heavy lifting" of syncing
// While we do this at startup to do the "heavy lifting" of syncing

Comment on lines +112 to +117
// Pause until the Stacks node is fully-synced, otherwise we will not be
// able to properly back-fill blocks and the sBTC signer will generally just
// not work.
tracing::info!("waiting for stacks node to report full-sync");
wait_for_stacks_node_to_report_full_sync(&context).await?;
tracing::info!("stacks node reports that is is up-to-date");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity check: iiuc we will start getting events from stacks core before we have the relevant stacks node/tx in the db. Is it ok? eg, in the past we did have a fkey constraint on rotate key events (we no longer do), something like that would make this stall. Are we sure we don't have any other constraint like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is something that worried me a little too. I didn't see any FKs when I glanced through but it's possible there's something I missed or an edge-case somewhere.

It didn't really feel in-scope but one thing I think we should possibly do is change the event observers to only persist the events to the db to be processed in another task (which maybe waits for an "all-green" signal).

That would both help make sure the stacks node always gets a timely response so we don't block it, and lets us control when the events get processed.

Comment on lines +60 to +69
/// Get the Nakamoto activation height.
pub fn nakamoto_activation_height(&self) -> u64 {
self.nakamoto_activation_height.load(Ordering::SeqCst)
}

/// Set the Nakamoto activation height.
pub fn set_nakamoto_activation_height(&self, height: u64) {
self.nakamoto_activation_height
.store(height, Ordering::SeqCst);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: given we may want to use something that is not nakamoto activation height for this (eg, to avoid fetching blocks we know we don't care about), should we rename this to something more general?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe just sbtc_activation_height or sbtc_bitcoin_activation_height?

Copy link
Collaborator

@matteojug matteojug Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it has bitcoin and height in it, it works for me 😄

// way the Stacks event observer works, i.e. the Stacks node will not
// progress if it cannot send events to the observer and thus become out-
// of-sync.
let signer_api_handle = tokio::spawn(run_api(context.clone()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use run_checked also here? otherwise if this crashes it would not signal the termination to the others.

Comment on lines +23 to +24
// If the nakamoto start height is provided in the config, use that value.
// Otherwise, get the value from the Stacks node's PoX-info endpoint.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I guess this is a TODO now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, missed update

Comment on lines +129 to +131
if term.shutdown_signalled() {
return Ok(());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: why not return Err(Error::SignerShutdown)? also for stacks one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from an earlier refactor, will fix


// If the block height is less than the Nakamoto activation height, we
// have reached the end of the sync and can break out of the loop.
if block_height < nakamoto_activation_height {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in stacks we bail a bit earlier (tenure.anchor_block_height <= until_bitcoin_height), I guess it doesn't matter -- personally I would leave this as is, just flagging it for a double check

Comment on lines +247 to +250
let first_block = tenure
.blocks()
.first()
.ok_or(Error::EmptyTenure(next_tenure_tip))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fetch_unknown_ancestors we use the last block, so I guess one of the two is wrong? the wrong one will still work since we are advancing, just very slowly (going through all the blocks instead of jumping right at the end)

Comment on lines +249 to +250
.first()
.ok_or(Error::EmptyTenure(next_tenure_tip))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can move the first.ok_or up to replace is_empty()

Comment on lines +263 to +268
// If the block height of the anchor block is equal to the nakamoto
// activation height then we're done.
if bitcoin_block.block_height == nakamoto_activation_height {
tracing::info!("nakamoto activation height reached, stacks block-sync completed");
return Ok(());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the block height of the anchor block is equal to the nakamoto
// activation height then we're done.
if bitcoin_block.block_height == nakamoto_activation_height {
tracing::info!("nakamoto activation height reached, stacks block-sync completed");
return Ok(());
}
// If the block height of the anchor block is equal to the nakamoto
// activation height then we're done.
if bitcoin_block.block_height <= nakamoto_activation_height {
tracing::info!("nakamoto activation height reached, stacks block-sync completed");
return Ok(());
}

since we can have gaps? also to be consistent with fetch_unknown_ancestors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point 👍

let mut interval = tokio::time::interval(Duration::from_secs(5));
loop {
match stacks_client.get_node_info().await {
Ok(node_info) if node_info.is_fully_synced => break,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that in the nodes stalled on devnet this returns true, can we rely on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's bad.

Copy link
Member Author

@cylewitruk cylewitruk Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely not -- all of my assumptions about this flag have been thrown out the window 😬 We'll need to ask what it actually means... it could be that it's in sync with bitcoin, or it could mean that it's in sync with other nodes when it comes to receiving (but not processing) stacks blocks... or something completely different

Copy link
Collaborator

@djordon djordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think this solves one issue but opens another. I'm not entirely sure that I got this right but we may have an issue in the following scenario:

  1. The signer is up and running, it has complete blockchain data in its database.
  2. The signer binary stops for some reason, it's down for some number of hours.
  3. The signer binary resumes, starts backfilling bitcoin blockchain data, but then stops again before the sync completes.
  4. Some time later (I think any amount of time later) it resumes yet again.

After (4), I don't think that the code here will fetch all blockchain data. I think this is because the backfilling is done from the wrong direction (from the chain tip) but it needs to be done from the last known block or our "genesis" block.

Comment on lines +89 to +95
sync_bitcoin_blocks(
ctx,
&tenure.anchor_block_hash,
tenure.anchor_block_height,
nakamoto_activation_height,
)
.await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'm not sure if we should tie the stacks block with the bitcoin block. It will probably work out, but I would think that we should always get the canonical bitcoin blockchain, and then get the canonical stacks blockchain. It would be better to do it in a way where they are completely decoupled.

Comment on lines +231 to +233
// Retrieve the anchor Bitcoin block from our local storage. We should
// have already processed this block and thus it should exist.
let bitcoin_block = storage.get_bitcoin_block(&tenure.anchor_block_hash).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should depend on the anchor block being in the database. I know it sounds counterintuitive, but we should always start at the stacks chain tip and figure out which stacks blocks we don't know about and get them. If those blocks are not anchored to the canonical bitcoin blockchain then that's fine, Stacks will sort it out, those blocks will be abandoned and all will be fine. All we need to do is ensure that we have the canonical bitcoin blockchain in our database.

let mut interval = tokio::time::interval(Duration::from_secs(5));
loop {
match stacks_client.get_node_info().await {
Ok(node_info) if node_info.is_fully_synced => break,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's bad.

Comment on lines +310 to +314
/// Waits for the Stacks node to report that it is fully-synced. It does this
/// by polling the node's `info` endpoint every 5 seconds until the response's
/// `is_fully_synced` field is `true`.
#[tracing::instrument(skip_all)]
pub async fn wait_for_stacks_node_to_report_full_sync(ctx: &impl Context) -> Result<(), Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait for the stacks node to report being full sync? What are the risks of having the sBTC signer starting where, say, the bitcoin node is synced by the stacks node is not?

I mean, are these risks that are unique to booting the sBTC signer, or are these ever-present issues?

Copy link
Member Author

@cylewitruk cylewitruk Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it's not a risk per-se, but if the Stacks node isn't synced then we know that the signer isn't going to behave too well.

So this is more of a cheap insurance that the Stacks node is functioning and up-to-date before starting up and joining p2p.

I guess my counter-question would be why not ensure things are in a good state before starting? For a relatively short pause in operation, this shouldn't result in much of a pause (if any). If we join p2p and start signing sweeps but can't handle validating (or coordinating/submitting) mints, or whatever the case may be, wouldn't it be better to not do either? 🤷🏻‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I gave the wrong impression here. I'm not saying we shouldn't wait, but I was thinking: shouldn't we be waiting more often? That's what I meant by

I mean, are these risks that are unique to booting the sBTC signer, or are these ever-present issues?

I was also thinking about whether we should be waiting differently. For example, the signer should probably be updating their database with bitcoin data, so that the backfill is faster when their Stacks node comes back online. And maybe they should vote on deposits. But yeah there are definitely things that they shouldn't do, like signing rounds and DKG (because of the key rotation contract call logic).

@cylewitruk cylewitruk marked this pull request as draft December 12, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sbtc signer binary The sBTC Bootstrap Signer.
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

[Feature]: Add a bitcoin block start height to the signers
4 participants