-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: bitcoin + stacks blockchain sync on startup #1087
base: main
Are you sure you want to change the base?
Conversation
/// When using the postgres storage, we need to make sure that this | ||
/// function is called after the `Self::write_bitcoin_block` function | ||
/// because of the foreign key constraints. | ||
pub async fn extract_sbtc_transactions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: these are just moved to top-level fn's in the file so they can be used elsewhere without needing to instantiate a block observer instance. Maybe they should even be in a separate file.
let stacks_client = self.context.get_stacks_client(); | ||
let tenure_info = stacks_client.get_tenure_info().await?; | ||
|
||
// While we do do this at startup to do the "heavy lifting" of syncing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// While we do do this at startup to do the "heavy lifting" of syncing | |
// While we do this at startup to do the "heavy lifting" of syncing |
// Pause until the Stacks node is fully-synced, otherwise we will not be | ||
// able to properly back-fill blocks and the sBTC signer will generally just | ||
// not work. | ||
tracing::info!("waiting for stacks node to report full-sync"); | ||
wait_for_stacks_node_to_report_full_sync(&context).await?; | ||
tracing::info!("stacks node reports that is is up-to-date"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanity check: iiuc we will start getting events from stacks core before we have the relevant stacks node/tx in the db. Is it ok? eg, in the past we did have a fkey constraint on rotate key events (we no longer do), something like that would make this stall. Are we sure we don't have any other constraint like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is something that worried me a little too. I didn't see any FKs when I glanced through but it's possible there's something I missed or an edge-case somewhere.
It didn't really feel in-scope but one thing I think we should possibly do is change the event observers to only persist the events to the db to be processed in another task (which maybe waits for an "all-green" signal).
That would both help make sure the stacks node always gets a timely response so we don't block it, and lets us control when the events get processed.
/// Get the Nakamoto activation height. | ||
pub fn nakamoto_activation_height(&self) -> u64 { | ||
self.nakamoto_activation_height.load(Ordering::SeqCst) | ||
} | ||
|
||
/// Set the Nakamoto activation height. | ||
pub fn set_nakamoto_activation_height(&self, height: u64) { | ||
self.nakamoto_activation_height | ||
.store(height, Ordering::SeqCst); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: given we may want to use something that is not nakamoto activation height for this (eg, to avoid fetching blocks we know we don't care about), should we rename this to something more general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, maybe just sbtc_activation_height
or sbtc_bitcoin_activation_height
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it has bitcoin
and height
in it, it works for me 😄
// way the Stacks event observer works, i.e. the Stacks node will not | ||
// progress if it cannot send events to the observer and thus become out- | ||
// of-sync. | ||
let signer_api_handle = tokio::spawn(run_api(context.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use run_checked
also here? otherwise if this crashes it would not signal the termination to the others.
// If the nakamoto start height is provided in the config, use that value. | ||
// Otherwise, get the value from the Stacks node's PoX-info endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I guess this is a TODO now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, missed update
if term.shutdown_signalled() { | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: why not return Err(Error::SignerShutdown)
? also for stacks one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from an earlier refactor, will fix
|
||
// If the block height is less than the Nakamoto activation height, we | ||
// have reached the end of the sync and can break out of the loop. | ||
if block_height < nakamoto_activation_height { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in stacks we bail a bit earlier (tenure.anchor_block_height <= until_bitcoin_height
), I guess it doesn't matter -- personally I would leave this as is, just flagging it for a double check
let first_block = tenure | ||
.blocks() | ||
.first() | ||
.ok_or(Error::EmptyTenure(next_tenure_tip))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fetch_unknown_ancestors
we use the last block, so I guess one of the two is wrong? the wrong one will still work since we are advancing, just very slowly (going through all the blocks instead of jumping right at the end)
.first() | ||
.ok_or(Error::EmptyTenure(next_tenure_tip))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can move the first.ok_or
up to replace is_empty()
// If the block height of the anchor block is equal to the nakamoto | ||
// activation height then we're done. | ||
if bitcoin_block.block_height == nakamoto_activation_height { | ||
tracing::info!("nakamoto activation height reached, stacks block-sync completed"); | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the block height of the anchor block is equal to the nakamoto | |
// activation height then we're done. | |
if bitcoin_block.block_height == nakamoto_activation_height { | |
tracing::info!("nakamoto activation height reached, stacks block-sync completed"); | |
return Ok(()); | |
} | |
// If the block height of the anchor block is equal to the nakamoto | |
// activation height then we're done. | |
if bitcoin_block.block_height <= nakamoto_activation_height { | |
tracing::info!("nakamoto activation height reached, stacks block-sync completed"); | |
return Ok(()); | |
} |
since we can have gaps? also to be consistent with fetch_unknown_ancestors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point 👍
let mut interval = tokio::time::interval(Duration::from_secs(5)); | ||
loop { | ||
match stacks_client.get_node_info().await { | ||
Ok(node_info) if node_info.is_fully_synced => break, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that in the nodes stalled on devnet this returns true
, can we rely on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely not -- all of my assumptions about this flag have been thrown out the window 😬 We'll need to ask what it actually means... it could be that it's in sync with bitcoin, or it could mean that it's in sync with other nodes when it comes to receiving (but not processing) stacks blocks... or something completely different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think this solves one issue but opens another. I'm not entirely sure that I got this right but we may have an issue in the following scenario:
- The signer is up and running, it has complete blockchain data in its database.
- The signer binary stops for some reason, it's down for some number of hours.
- The signer binary resumes, starts backfilling bitcoin blockchain data, but then stops again before the sync completes.
- Some time later (I think any amount of time later) it resumes yet again.
After (4), I don't think that the code here will fetch all blockchain data. I think this is because the backfilling is done from the wrong direction (from the chain tip) but it needs to be done from the last known block or our "genesis" block.
sync_bitcoin_blocks( | ||
ctx, | ||
&tenure.anchor_block_hash, | ||
tenure.anchor_block_height, | ||
nakamoto_activation_height, | ||
) | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I'm not sure if we should tie the stacks block with the bitcoin block. It will probably work out, but I would think that we should always get the canonical bitcoin blockchain, and then get the canonical stacks blockchain. It would be better to do it in a way where they are completely decoupled.
// Retrieve the anchor Bitcoin block from our local storage. We should | ||
// have already processed this block and thus it should exist. | ||
let bitcoin_block = storage.get_bitcoin_block(&tenure.anchor_block_hash).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should depend on the anchor block being in the database. I know it sounds counterintuitive, but we should always start at the stacks chain tip and figure out which stacks blocks we don't know about and get them. If those blocks are not anchored to the canonical bitcoin blockchain then that's fine, Stacks will sort it out, those blocks will be abandoned and all will be fine. All we need to do is ensure that we have the canonical bitcoin blockchain in our database.
let mut interval = tokio::time::interval(Duration::from_secs(5)); | ||
loop { | ||
match stacks_client.get_node_info().await { | ||
Ok(node_info) if node_info.is_fully_synced => break, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's bad.
/// Waits for the Stacks node to report that it is fully-synced. It does this | ||
/// by polling the node's `info` endpoint every 5 seconds until the response's | ||
/// `is_fully_synced` field is `true`. | ||
#[tracing::instrument(skip_all)] | ||
pub async fn wait_for_stacks_node_to_report_full_sync(ctx: &impl Context) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to wait for the stacks node to report being full sync? What are the risks of having the sBTC signer starting where, say, the bitcoin node is synced by the stacks node is not?
I mean, are these risks that are unique to booting the sBTC signer, or are these ever-present issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it's not a risk per-se, but if the Stacks node isn't synced then we know that the signer isn't going to behave too well.
So this is more of a cheap insurance that the Stacks node is functioning and up-to-date before starting up and joining p2p.
I guess my counter-question would be why not ensure things are in a good state before starting? For a relatively short pause in operation, this shouldn't result in much of a pause (if any). If we join p2p and start signing sweeps but can't handle validating (or coordinating/submitting) mints, or whatever the case may be, wouldn't it be better to not do either? 🤷🏻♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I gave the wrong impression here. I'm not saying we shouldn't wait, but I was thinking: shouldn't we be waiting more often? That's what I meant by
I mean, are these risks that are unique to booting the sBTC signer, or are these ever-present issues?
I was also thinking about whether we should be waiting differently. For example, the signer should probably be updating their database with bitcoin data, so that the backfill is faster when their Stacks node comes back online. And maybe they should vote on deposits. But yeah there are definitely things that they shouldn't do, like signing rounds and DKG (because of the key rotation contract call logic).
Description
Closes: #685
Changes
Will update, draft for now
Testing Information
Checklist: