-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Transaction locking fails prematurely due to cost tracker constraints #34825
Comments
I think this solution is better than what I proposed in #34807 and this issue can replace that one if we agree this is a better approach. |
agree this is better approach. |
I like this approach, but we should be a bit careful with the locks on shared resources: cost_tracker, account_locks
In pursuit of 1, any transaction cost calculation should be done outside of the locks - I'm fairly certain this is already the case. |
Even I don't think we take these two locks together today, I'm not so confident we can keep stable global order for them. Rather like to avoid taking multiple locks. |
Been back-n-forth on several tries, I found I largely agree with @apfitzge points above. To not to grab two locks at a time, and do it in batch instead of tx-by-tx. I'm leaning to original solution: reserve CU for batch -> acquire locks for batch -> remove CU for transactions failed get accounts lock, and do it in batch. (first two steps are as-is, just adding 3rd step immediately after locking). What are your opinions? |
Yeah taking both locks is probably too risky.. How about we first reserve as much as cost as we need in qos and then keep track of how much of that cost we've used so far as we do the transaction account locking loop? After locking, we release any reserved cost back to the qos cost tracker for other workers. |
This still seems to suffer from the issue though, within the same thread. Let's say we're close to our limits and we receive a batch [Tx1, Tx2]. If we (reserve qos, take locks, free qos) then neither tx will get executed in this case. |
It seems we also may want different solutions for old vs new scheduler. So it'd make sense for new scheduler to just lock and then reserve in qos. If we're hitting block-limits this still seems better, because qos getting maxed-out on block-limits means all other threads are blocked. wdyt? |
Yea, it does not help on this particular batch, but released cost_tracker space helps other threads to book their txs. |
Yea, I initially considered this (without thinking about new scheduler behavior). I felt better not to messing around accounts_lock too much, but that might not be a good argument anyway. |
Problem
When selecting transactions from a batch for execution in banking stage, there are two preliminary steps: qos cost reservation and global lock acquisition. In the first step, we iterate over a batch of transactions and reserve qos service cost capacity until the capacity is depleted. Any transactions towards the end of the batch list that cannot fit will be marked for retry and skipped for the current batch. Then, in the next step, we again iterate over the batch of transactions and attempt to acquire locks for any un-skipped transactions. If any locks fail, there is leftover cost capacity that could have be used for the previously skipped transactions.
Credit: @crispheaney
Proposed Solution
Combine the loops for qos cost reservation and global lock acquisition and use the following process:
cc @taozhu-chicago @apfitzge
The text was updated successfully, but these errors were encountered: