PooledSpanDictionary improvements #450

Scooletz · 2024-12-17T09:12:19Z

PooledSpanDictionary is the main component responsible for storing Paprika's data in-memory. It does by bucketing keys heavily using 16 pages as the initial fan out and then storing data as single linked lists. Currently, the addresses that are stored are 4 bytes long and point to a given page in the list to a given position (encoded as an int). PooledSpanDictionary is not thread safe as well, which requires providing Child commits when running ComputeMerkleBehavior. The following items could improve overall performance of the dictionary.

migrate address to a pointer
make PooledSpanDictionary concurrent for upserts/deletes

The 1st would introduce some overhead as currently address is 4 bytes long, when after migration would be 8 bytes long on x64. This would drastically reduce the lookup time though as the next item in a linked list could be accessed directly by following the pointer, rather then finding the page in the list and then performing the calculation. It should greatly reduce the read overhead over the PooledSpanDictionary.

PooledSpanDictionary after #443 no longer requires visitor pattern to access the accounts and storage trees that were changed lately. This means, that as long as atomicity is preserved (both for allocation and for updates) we could make this dictionary much more like the ConcurrentDictionary. In regards to updating or finding the given key, a simple volatile at the top and walking down the linked list should be sufficient when we know that no colliding updates can happen. If this is not the case, an Interlocked lease could be used. It's worth to mention that the ComputeMerkleBehavior is designed in a way that there will be no colliding updates by multiple threads at the given time. This would mean, that updates in-situ, deletes and gets could be addressed using the technique described above. In regards to allocating new chunks, this could be done in a fully locked way, where the current working page is checked under a lock whether it can providing the required space, if not another would be added. As 1st point would ensure that we use direct pointers, no access to the list would be needed beside this allocation. This could allow to pass the same dictionary to all the work items and ditch the idea of the child commit (or just wrap the same dictionary)

Both should be measured extensively using benchmarks.

The text was updated successfully, but these errors were encountered:

Scooletz added the 🐌 performance Perofrmance related issue label Dec 17, 2024

Scooletz linked a pull request Dec 19, 2024 that will close this issue

Pooled span dicitionary direct pointer #455

Draft

Scooletz self-assigned this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PooledSpanDictionary improvements #450

PooledSpanDictionary improvements #450

Scooletz commented Dec 17, 2024

PooledSpanDictionary improvements #450

PooledSpanDictionary improvements #450

Comments

Scooletz commented Dec 17, 2024