-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi: Rework utxoset/view to use outpoints. #2540
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round 1.
I still have a lot of review and testing to do, but I gave this a very quick first pass to find anything that stood out before really digging into the details and doing in-depth testing that I wanted to go ahead and provide feedback for first.
This is fantastic overall and great to see it ported over. I really appreciate the time put into creating the PR description with all of the relevant details and references, important notes, testing implications, etc. That really helps not only the initial review, but for historical purposes as well. Also, from having done the PR this one is porting over, I know how much effort it takes to make this type of changes that have far reaching implications and thus also make a big impact, so I thank you for taking this on!
cb72606
to
2f4d524
Compare
I made an update to store I am going to start taking a look at implementing a migration that recreates the spend journal data in the new format from scratch (in a separate PR) since I'd prefer not to have to keep the legacy handling code around for long. Edit: If feasible, I may include that spend journal migration in this PR. Having this migration would allow us to only keep ticket minimal outputs in the submission output, but that would need to modify the utxo migration, and ideally we don’t want two utxo migrations back to back. |
87a1d67
to
335e895
Compare
Found one more issue related to processing deep reorgs with the legacy stxo handling (diff). This is actually a bug in the btcd code as well - it is fetching a utxo using the tx hash rather than the hash of the input transaction (though it doesn't really matter, since that code path would never be hit at this point since its only run when encountering legacy stxo entires). I should be able to get the stxo migration in place relatively quickly, which will avoid this case altogether (though unfortunately, since it needs to read all historical blocks/txns, it will likely take in the range of 10-20 mins to complete). |
335e895
to
1be7501
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this PR. Only have a few minor points right now.
1be7501
to
d83712d
Compare
Rebased to Also, as a heads up, I have the stxo migration ready and will be pushing it to this PR shortly. Sorry in advance for the churn, but I think it is worth bringing into this PR directly because it avoids having another utxo migration and completely removes the legacy handling cases. I'll call out the changes/diff when I push that up. |
Agreed. I was going to suggest this myself if it was done prior to this one being finalized, so glad to see you're of the same mindset. |
d138150
to
4da0eda
Compare
The spend journal migration has now been added (diff). The updates include the following:
|
I also tested a deep reorg (invalidated/reconsidered a block thousands of blocks deep into the migrated data) to ensure that the migrated spent journal entries could be resurrected properly. |
4da0eda
to
4246273
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round 2.
I've done quite a bit of testing at this point including very large reorgs and the functionality in terms of correctness is looking great thus far.
I still need to dig into one or two more areas for a final round of review, but here is some more feedback in the mean time.
} | ||
|
||
// Upgrade the database post block index load as needed. | ||
return upgradeDBPostBlockIndexLoad(ctx, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this a bit offline, but I'll comment here for future reference as well.
This will end up causing issues with future multi-version upgrades because it means that the upgrade process will not be able to complete uninterrupted and the need for it to run the normal code which relies on upgrades having happened.
For example, consider an upgrade from say version 7 to a future version 10. It will complete the version 7 to 8 upgrade, but then not be able to get to version 9 because that can only happen once the normal upgrade process completes and the block index is fully loaded. However, the code will otherwise depend on the database version having been upgraded to version 10 at that point, but it won't be.
We decided to go ahead and leave this in for now as the plan is to address it later by a more robust general reindexing solution.
9f0470d
to
376805d
Compare
}, | ||
{ | ||
name: "outputs 0 and 2 not coinbase", | ||
serialized: hexToBytes("df3982a7310132000496b538e853519c726a2c91e61ec11" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have manually verified all of these serializations in TestUtxoSerialization
are accurate.
5b1fc34
to
c547579
Compare
9cb1e3c
to
b606b4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only outstanding issue as far as I can tell is the migration code issue surrounding side chain blocks. Once that is in, I'll give it another round of review and testing and assuming that goes well, this will be ready.
b606b4c
to
0e1b11d
Compare
This should be resolved now, did further testing around side chain blocks for the migration as well as invalidating/reconsidering the main chain block at the same height as a side chain block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All updates reviewed and test.
Migration stats:
SSD:
2021-01-10 15:26:08.780 [INF] CHAN: Done migrating utxoset. Total entries: 660256 in 14.265s
2021-01-10 15:26:08.780 [INF] CHAN: Removing old utxoset entries...
2021-01-10 15:26:11.109 [INF] CHAN: Deleted 660256 keys (660256 total) from old utxoset
2021-01-10 15:26:11.109 [INF] CHAN: Done removing old utxoset entries in 2.328s
...
2021-01-10 15:41:54.722 [INF] CHAN: Done migrating spend journal. Total entries: 496670 in 15m37.916s
2021-01-10 15:41:54.722 [INF] CHAN: Removing temp data...
...
2021-01-10 15:42:39.690 [INF] CHAN: Done removing temp data in 44.968s
...
2021-01-10 15:42:43.155 [INF] CHAN: Done removing old spend journal in 3.464s
2021-01-10 15:42:43.155 [INF] CHAN: Done upgrading database in 16m48.644s.
2021-01-10 15:42:43.158 [INF] CHAN: Blockchain database version info: chain: 9, compression: 1, block index: 3
...
HDD:
2021-01-10 16:16:48.859 [INF] CHAN: Done migrating utxoset. Total entries: 659062 in 14.846s
2021-01-10 16:16:48.885 [INF] CHAN: Removing old utxoset entries...
2021-01-10 16:16:51.157 [INF] CHAN: Deleted 659062 keys (659062 total) from old utxoset
2021-01-10 16:16:51.180 [INF] CHAN: Done removing old utxoset entries in 2.295s
...
2021-01-10 16:57:41.320 [INF] CHAN: Done migrating spend journal. Total entries: 518326 in 40m29.163s
2021-01-10 16:57:41.320 [INF] CHAN: Removing temp data...
...
2021-01-10 16:59:50.088 [INF] CHAN: Done removing temp data in 2m8.767s
...
2021-01-10 17:00:43.481 [INF] CHAN: Done removing old spend journal in 53.393s
2021-01-10 17:00:47.981 [INF] CHAN: Done upgrading database in 44m13.969s.
2021-01-10 17:00:48.520 [INF] CHAN: Blockchain database version info: chain: 9, compression: 1, block index: 3
Memory usage peaked around 800MB.
0e1b11d
to
3c99478
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good other than a typo.
This modifies the utxoset in the database and related UtxoViewpoint to store and work with unspent transaction outputs on a per-output basis instead of at a transaction level. The primary motivation is to simplify the code, pave the way for a utxo cache, and generally focus on optimizing runtime performance. The tradeoff is that this approach does somewhat increase the size of the serialized utxoset since it means that the transaction hash is duplicated for each output as a part of the key and some additional details are duplicated in each output. The details duplicated in each output include flags encoded into a single byte that specify whether the containing transaction is a coinbase, whether the containing transaction has an expiry, and the transaction type. Additionally, the containing block height and index are stored in each output. However, in practice, the size difference isn't all that large, disk space is relatively cheap, certainly cheaper than memory, and it is much more important to provide more efficient runtime operation since that is the ultimate purpose of the daemon. While performing this conversion, it also simplifies the code to remove the transaction version information from the utxoset as well as the spend journal. The logic for only serializing it under certain circumstances is complicated, and it was only used for the gettxout RPC, where it has already been removed. The utxo set and spend journal in the database are automatically migrated to the new format with this commit and it is possible to interrupt and resume the migration process. Finally, it also updates all references and tests that previously dealt with transaction hashes to use outpoints instead. An overview of the changes are as follows: - Remove transaction version from both spent and unspent output entries - Update utxo serialization format to exclude the version - Update spend journal serialization format to exclude the version - Convert UtxoEntry to represent a specific utxo instead of a transaction with all remaining utxos - Optimize for memory usage with an eye toward a utxo cache - Combine fields such as whether the containing transaction is a coinbase, whether the containing transaction has an expiry, and the transaction type into a single byte - Align entry fields to eliminate extra padding since ultimately there will be a lot of these in memory - Introduce a free list for serializing an outpoint to the database key format to significantly reduce pressure on the GC - Update entries to be keyed by a <hash><tree><output index> outpoint rather than just a tx hash - Update all related functions that previously dealt with transaction hashes to accept outpoints instead - Update all callers accordingly - Only add individually requested outputs from the mempool when constructing a mempool view - Modify the spend journal to always store the encoded flags with every spent txout - Combine fields such as whether the containing transaction is a coinbase, whether the containing transaction has an expiry, and the transaction type into a single byte - Use 4 bits instead of 3 for the transaction type to be consistent with utxos. The extra bit was already unused so this doesn't take any additional space - Remove the fully spent flag - Introduce ticketMinOuts in place of stakeExtra - Renamed stakeExtra as ticketMinOuts and updated all comments to make the purpose of the field clearer - Only store ticketMinOuts for ticket submission outputs - Add TicketMinimalOutputs function on UtxoEntry in place of ConvertUtxosToMinimalOutputs - Always decompress data loaded from the database now that a utxo entry only consists of a specific output - Introduce upgrade code to migrate the utxo set and spend journal to the new format - Update current database version to 9 - Update current utxo set version to 3 - Update current spend journal version to 3 - Introduce the ability to run upgrades after the block index has been loaded - Update all tests to expect the correct encodings, remove tests that no longer apply, and add new ones for the new expected behavior - Convert old tests for the legacy utxo format deserialization code to test the new function that is used during upgrade - Introduce a few new functions on UtxoViewpoint - AddTxOut for adding an individual txout versus all of them - addTxOut to handle the common code between the new AddTxOut and existing AddTxOuts - RemoveEntry for removing an individual txout - Remove the ErrDiscordantTxTree error - Since utxos are now retrieved using an outpoint, which includes the tree, it is no longer possible to hit this error path
3c99478
to
397a9e9
Compare
NOTE: This contains a database migration, so if you plan to test it, please make a copy of your data directory first.
This modifies the utxoset in the database and related
UtxoViewpoint
to store and work with unspent transaction outputs on a per-output basis instead of at a transaction level. This is a port of btcsuite/btcd#1045 with some significant differences due to the stake tree and transactions in Decred.The primary motivation is to simplify the code, pave the way for a utxo cache, and generally focus on optimizing runtime performance.
The tradeoff is that this approach does somewhat increase the size of the serialized utxoset since it means that the transaction hash is duplicated for each output as a part of the key and some additional details are duplicated in each output. The details duplicated in each output include flags encoded into a single byte that specify whether the containing transaction is a coinbase, whether the containing transaction has an expiry, and the transaction type. Additionally, the containing block height and index are stored in each output.
However, in practice, the size difference isn't all that large, disk space is relatively cheap, certainly cheaper than memory, and it is much more important to provide more efficient runtime operation since that is the ultimate purpose of the daemon.
While performing this conversion, it also simplifies the code to remove the transaction version information from the utxoset as well as the spend journal. The logic for only serializing it under certain circumstances is complicated, and it was only used for the
gettxout
RPC, where it has already been removed.The utxo set and spend journal in the database are automatically migrated to the new format with this commit and it is possible to interrupt and resume the migration process.
Finally, it also updates all references and tests that previously dealt with transaction hashes to use outpoints instead.
An overview of the changes are as follows:
UtxoEntry
to represent a specific utxo instead of a transaction with all remaining utxos<hash><tree><output index>
outpoint rather than just a tx hashticketMinOuts
in place ofstakeExtra
stakeExtra
asticketMinOuts
and updated all comments to make the purpose of the field clearerticketMinOuts
for ticket submission outputsTicketMinimalOutputs
function onUtxoEntry
in place ofConvertUtxosToMinimalOutputs
UtxoViewpoint
AddTxOut
for adding an individual txout versus all of themaddTxOut
to handle the common code between the newAddTxOut
and existingAddTxOuts
RemoveEntry
for removing an individual txoutErrDiscordantTxTree
errorA couple other notes on the implementation:
Part of #1145.