You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nexus currently uses pogreb to cache static node queries in order to reduce the load on oasis-node and improve performance during reindexing. We currently maintain a separate cache for each runtime, eg /rpc-cache/indexer/consensus.
Pogreb has a known limitation where it rebuilds the entire index during the recovery process, which can take hours to days for large databases. The recovery process is triggered if the lockfile is found in the cache directory, which (often) can happen if Nexus is interrupted while rebuilding the index.
The consensus cache is by far the largest and thus the most problematic. The following are the cache sizes for each runtime on mainnet Nexus.
The Nexus logs show that testnet consensus cache initialization takes ~6 days for 4.5TB. (Wasn't able to find logs for mainnet consensus but it's slower).
We should find a way to eliminate these long initialization times. One option would be to shard the consensus caches into multiple pogreb dbs, for instance by upgrade (mainnet/cobalt/damask/eden). We could also explore alternative local kv stores instead of pogreb.
Side note: The logs indicated that the preBackup call might be taking a long time as well - especially the deleteFiles call. It might be good to add some log messages/timings and test it out on the production deploy to see if there's some easy optimization there. When I checked on 12/1 there were ~1800 segment files and 200 backups (*.bac.bac) in the consensus mainnet database - maybe having a large number of files in the cache directory slows the deleteFiles calls down.
The text was updated successfully, but these errors were encountered:
Nexus currently uses pogreb to cache static node queries in order to reduce the load on oasis-node and improve performance during reindexing. We currently maintain a separate cache for each runtime, eg
/rpc-cache/indexer/consensus
.Pogreb has a known limitation where it rebuilds the entire index during the recovery process, which can take hours to days for large databases. The recovery process is triggered if the
lock
file is found in the cache directory, which (often) can happen if Nexus is interrupted while rebuilding the index.The consensus cache is by far the largest and thus the most problematic. The following are the cache sizes for each runtime on mainnet Nexus.
And testnet Nexus
The Nexus logs show that testnet consensus cache initialization takes ~6 days for 4.5TB. (Wasn't able to find logs for mainnet consensus but it's slower).
We should find a way to eliminate these long initialization times. One option would be to shard the consensus caches into multiple pogreb dbs, for instance by upgrade (mainnet/cobalt/damask/eden). We could also explore alternative local kv stores instead of pogreb.
Side note: The logs indicated that the preBackup call might be taking a long time as well - especially the deleteFiles call. It might be good to add some log messages/timings and test it out on the production deploy to see if there's some easy optimization there. When I checked on 12/1 there were ~1800 segment files and 200 backups (
*.bac.bac
) in the consensus mainnet database - maybe having a large number of files in the cache directory slows the deleteFiles calls down.The text was updated successfully, but these errors were encountered: