Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PooledSpanDictionary improvements #450

Open
Scooletz opened this issue Dec 17, 2024 · 0 comments · May be fixed by #455
Open

PooledSpanDictionary improvements #450

Scooletz opened this issue Dec 17, 2024 · 0 comments · May be fixed by #455
Assignees
Labels
🐌 performance Perofrmance related issue

Comments

@Scooletz
Copy link
Contributor

PooledSpanDictionary is the main component responsible for storing Paprika's data in-memory. It does by bucketing keys heavily using 16 pages as the initial fan out and then storing data as single linked lists. Currently, the addresses that are stored are 4 bytes long and point to a given page in the list to a given position (encoded as an int). PooledSpanDictionary is not thread safe as well, which requires providing Child commits when running ComputeMerkleBehavior. The following items could improve overall performance of the dictionary.

  1. migrate address to a pointer
  2. make PooledSpanDictionary concurrent for upserts/deletes

The 1st would introduce some overhead as currently address is 4 bytes long, when after migration would be 8 bytes long on x64. This would drastically reduce the lookup time though as the next item in a linked list could be accessed directly by following the pointer, rather then finding the page in the list and then performing the calculation. It should greatly reduce the read overhead over the PooledSpanDictionary.

PooledSpanDictionary after #443 no longer requires visitor pattern to access the accounts and storage trees that were changed lately. This means, that as long as atomicity is preserved (both for allocation and for updates) we could make this dictionary much more like the ConcurrentDictionary. In regards to updating or finding the given key, a simple volatile at the top and walking down the linked list should be sufficient when we know that no colliding updates can happen. If this is not the case, an Interlocked lease could be used. It's worth to mention that the ComputeMerkleBehavior is designed in a way that there will be no colliding updates by multiple threads at the given time. This would mean, that updates in-situ, deletes and gets could be addressed using the technique described above. In regards to allocating new chunks, this could be done in a fully locked way, where the current working page is checked under a lock whether it can providing the required space, if not another would be added. As 1st point would ensure that we use direct pointers, no access to the list would be needed beside this allocation. This could allow to pass the same dictionary to all the work items and ditch the idea of the child commit (or just wrap the same dictionary)

Both should be measured extensively using benchmarks.

@Scooletz Scooletz added the 🐌 performance Perofrmance related issue label Dec 17, 2024
@Scooletz Scooletz linked a pull request Dec 19, 2024 that will close this issue
@Scooletz Scooletz self-assigned this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐌 performance Perofrmance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant