refactor(v2): prune performance #1034

kocubinski · 2024-12-31T23:23:35Z

A pruning event deletes all tree nodes invalidated (orphaned) at an arbitrary prune version, less than the tree's current version. Given the earliest version of the tree on disk m, and prune version n, v2 calculates the set difference all_nodes(m -> n) - orphaned_at(n) and, importantly, writes the set to disk in a new shard. For a B+Tree, inserting is markedly faster than deleting. After completion the old shard(s) are dropped, and complete version of the tree at version n is in shard n.

There are some opportunities in the current implementation for performance improvements.

Rate Limiting

The current rate limit mechanism is global pruning limit. This works but we can probably do better.

Right now pruning begins at prune_height + keep_versions even though a new shard is created at prune_height+1. Pruning could be started at prune_height and rate limited (slowed way down) to reduce I/O pressure. This is probably as simple as smaller batches and a configurable pause between to keep to a target insert rate.

SQLite JOIN

The current set difference calculation is CPU and memory intensive. It naively loads all orphans into a map and filters branch and leaf nodes against that map. The reason for this design is to keep leaf and branch orphan writes fast, there is no index on orphan tables. If there were a SQLite JOIN statement could be used to populate the new shard, which is probably vastly more efficient.

Pruning could build the orphan indexes on (version, sequence) then execute the JOIN statements. Of course this still costs CPU and I/O, so it's not clear where the infection point (on tree size) is for it to be more efficient, but it probably is in some cases.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(v2): prune performance #1034

refactor(v2): prune performance #1034

kocubinski commented Dec 31, 2024 •

edited

Loading

refactor(v2): prune performance #1034

refactor(v2): prune performance #1034

Comments

kocubinski commented Dec 31, 2024 • edited Loading

Rate Limiting

SQLite JOIN

kocubinski commented Dec 31, 2024 •

edited

Loading