Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WalletDB rescan deadlock fix #868

Merged
merged 2 commits into from
Oct 27, 2023
Merged

Conversation

nodech
Copy link
Contributor

@nodech nodech commented Oct 24, 2023

Calling wdb.rescan from http, can sometimes hang the whole node when wallet is run as a plugin to the node. This happens because the chain.locker and wdb.txLock sequence is swapped in wdb.rescan. It can happen when a node is adding/removing/reorging blocks and we request rescan. Here how it can happen and lock descriptions:

  • chain.add -> chain.locker.lock -> wdb.addBlock -> wdb.txLock.lock
  • wdb.rescan -> wdb.txLock.lock -> chain.scan -> chain.locker.lock

Here we can see that there are partial sequences that lead to deadlock:

  chain.add -> chain.locker.lock -> wdb.addBlock    -> wdb.txLock.lock
               wdb.rescan        -> wdb.txLock.lock -> chain.scan     -> chain.locker.lock
  OR
  wdb.rescan -> wdb.txLock.lock -> chain.scan        -> chain.locker.lock
                chain.add       -> chain.locker.lock -> wdb.addBlock     -> wdb.txLock.lock
  • wdb.addBlock can't go forward because wdb.rescan has locked txLock
  • chain.add can't finish because it's waiting for wdb.addBlock.
  • wdb.rescan -> chain.scan can't go forward because chain.locker is locked.

This issue is side effect of OOM fix for the plugins in bcoin-org/bcoin#932. Previously, chain would not wait for the wallet to finish addBlock, instead could move forward and unlock chain.locker when chain was done processing that block. Now it waits for the wallet to also finish the process. That, of course, caused issue when chain was moving forward much faster than wallet, wallet would have backlogged list of addBlocks with all relevant block/tx informations eventually causing OOM.

Note that this wont happen if the wallet is running separately as a service. Separate wallet service will experience a backlog instead, because the chain won't wait for HTTP socket events to finish processing. (And maybe it should, but that's a separate issue and continuation of the bcoin-org/bcoin#932)

Related

Changes

  • Using wdb.scan directly is deprecated, it no longer sets rescanning to true (it's @private so no problems there)
  • wdb.rescan and wdb._rescan (lock and w/o lock) are the proper places to set the rescanning to true.
  • initial sync (syncNode) will set rescanning right away, to avoid extra connect block events.
  • Ahead sync from walletDB (when wallet was disconnected) will now use ._rescan to ensure .rescanning is set.

@nodech
Copy link
Contributor Author

nodech commented Oct 24, 2023

Diff is messed up because I moved old wallet-rescan-test to wallet-namestate-rescan-test and used wallet-rescan-test for this one. For better diffing experience, go to the commits themselves.

@nodech nodech added this to the hsd 7.0.0 milestone Oct 24, 2023
@nodech nodech self-assigned this Oct 24, 2023
@coveralls
Copy link

Coverage Status

coverage: 68.553% (+0.004%) from 68.549% when pulling 29625eb on nodech:wallet-deadlock-fix into bb7da60 on handshake-org:master.

@nodech nodech added intermediate review difficulty - intermediate wallet part of the codebase patch Release version labels Oct 24, 2023
@nodech nodech requested a review from rithvikvibhu October 24, 2023 15:06
@nodech nodech merged commit bc3c172 into handshake-org:master Oct 27, 2023
@nodech nodech deleted the wallet-deadlock-fix branch October 27, 2023 08:50
@nodech nodech mentioned this pull request Nov 7, 2023
4 tasks
@nodech nodech mentioned this pull request Nov 29, 2024
@nodech nodech mentioned this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intermediate review difficulty - intermediate patch Release version wallet part of the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

should wdb.rescanning be set true earlier?
3 participants