You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PooledSpanDictionary is the main component responsible for storing Paprika's data in-memory. It does by bucketing keys heavily using 16 pages as the initial fan out and then storing data as single linked lists. Currently, the addresses that are stored are 4 bytes long and point to a given page in the list to a given position (encoded as an int). PooledSpanDictionary is not thread safe as well, which requires providing Child commits when running ComputeMerkleBehavior. The following items could improve overall performance of the dictionary.
migrate address to a pointer
make PooledSpanDictionary concurrent for upserts/deletes
The 1st would introduce some overhead as currently address is 4 bytes long, when after migration would be 8 bytes long on x64. This would drastically reduce the lookup time though as the next item in a linked list could be accessed directly by following the pointer, rather then finding the page in the list and then performing the calculation. It should greatly reduce the read overhead over the PooledSpanDictionary.
PooledSpanDictionary after #443 no longer requires visitor pattern to access the accounts and storage trees that were changed lately. This means, that as long as atomicity is preserved (both for allocation and for updates) we could make this dictionary much more like the ConcurrentDictionary. In regards to updating or finding the given key, a simple volatile at the top and walking down the linked list should be sufficient when we know that no colliding updates can happen. If this is not the case, an Interlocked lease could be used. It's worth to mention that the ComputeMerkleBehavior is designed in a way that there will be no colliding updates by multiple threads at the given time. This would mean, that updates in-situ, deletes and gets could be addressed using the technique described above. In regards to allocating new chunks, this could be done in a fully locked way, where the current working page is checked under a lock whether it can providing the required space, if not another would be added. As 1st point would ensure that we use direct pointers, no access to the list would be needed beside this allocation. This could allow to pass the same dictionary to all the work items and ditch the idea of the child commit (or just wrap the same dictionary)
Both should be measured extensively using benchmarks.
The text was updated successfully, but these errors were encountered:
PooledSpanDictionary
is the main component responsible for storing Paprika's data in-memory. It does by bucketing keys heavily using 16 pages as the initial fan out and then storing data as single linked lists. Currently, the addresses that are stored are 4 bytes long and point to a given page in the list to a given position (encoded as anint
).PooledSpanDictionary
is not thread safe as well, which requires providingChild
commits when runningComputeMerkleBehavior
. The following items could improve overall performance of the dictionary.PooledSpanDictionary
concurrent for upserts/deletesThe 1st would introduce some overhead as currently address is 4 bytes long, when after migration would be 8 bytes long on x64. This would drastically reduce the lookup time though as the next item in a linked list could be accessed directly by following the pointer, rather then finding the page in the list and then performing the calculation. It should greatly reduce the read overhead over the
PooledSpanDictionary
.PooledSpanDictionary
after #443 no longer requires visitor pattern to access the accounts and storage trees that were changed lately. This means, that as long as atomicity is preserved (both for allocation and for updates) we could make this dictionary much more like theConcurrentDictionary
. In regards to updating or finding the given key, a simple volatile at the top and walking down the linked list should be sufficient when we know that no colliding updates can happen. If this is not the case, anInterlocked
lease could be used. It's worth to mention that theComputeMerkleBehavior
is designed in a way that there will be no colliding updates by multiple threads at the given time. This would mean, that updates in-situ, deletes and gets could be addressed using the technique described above. In regards to allocating new chunks, this could be done in a fully locked way, where the current working page is checked under a lock whether it can providing the required space, if not another would be added. As 1st point would ensure that we use direct pointers, no access to the list would be needed beside this allocation. This could allow to pass the same dictionary to all the work items and ditch the idea of the child commit (or just wrap the same dictionary)Both should be measured extensively using benchmarks.
The text was updated successfully, but these errors were encountered: