Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Unreserve transaction costs from QoS service for unexecuted transactions #34807

Closed
jstarry opened this issue Jan 17, 2024 · 6 comments
Closed

Comments

@jstarry
Copy link
Contributor

jstarry commented Jan 17, 2024

Problem

In banking stage, the qos service cost tracker reserves cost for all transactions in a batch until all executable transactions are loaded, executed, and committed. This means that any transactions that aren't executed due to "account in use" errors will not have their costs removed from the cost tracker until after processing the full batch. This means that for a short duration, the cost tracker's block cost is over reserved and could prevent other threads from committing more transactions near the end of the block.

let (
(transaction_qos_cost_results, cost_model_throttled_transactions_count),
cost_model_us,
) = measure_us!(self.qos_service.select_and_accumulate_transaction_costs(
bank,
txs,
pre_results
));
// Only lock accounts for those transactions are selected for the block;
// Once accounts are locked, other threads cannot encode transactions that will modify the
// same account state
let (batch, lock_us) = measure_us!(bank.prepare_sanitized_batch_with_results(
txs,
transaction_qos_cost_results.iter().map(|r| match r {
Ok(_cost) => Ok(()),
Err(err) => Err(err.clone()),
})
));
// retryable_txs includes AccountInUse, WouldExceedMaxBlockCostLimit
// WouldExceedMaxAccountCostLimit, WouldExceedMaxVoteCostLimit
// and WouldExceedMaxAccountDataCostLimit
let mut execute_and_commit_transactions_output =
self.execute_and_commit_transactions_locked(bank, &batch);

credit: @crispheaney

Proposed Solution

Immediately free up transaction cost from the cost tracker in banking stage if transaction locks fail

cc @taozhu-chicago

@crispheaney
Copy link

Also, if im reading it correctly, the banking stage thread first filters out tx based on max block cu limits and then tries to acquire the write lock for tx afterwards

Might it be better if it filters out tx that are already locked before checking the max block cu limits?

I have a hunch that that during volatile periods, tx that can be executed are getting dropped prematurely because tx with hot accounts, that cant be executed because of write locks, are spending all the compute units in the cost tracker.

@tao-stones
Copy link
Contributor

Also, if im reading it correctly, the banking stage thread first filters out tx based on max block cu limits and then tries to acquire the write lock for tx afterwards

Might it be better if it filters out tx that are already locked before checking the max block cu limits?

I have a hunch that that during volatile periods, tx that can be executed are getting dropped prematurely because tx with hot accounts, that cant be executed because of write locks, are spending all the compute units in the cost tracker.

It probably is better to first check if transaction can fit into block (then reserve its space) before locking accounts it needs.

@crispheaney
Copy link

The scenario i'm imagining is that there are 10 tx with hot accounts at front of buffer with max cu request of 1.4m, they max out the cost tracker cus and all the tx after the first 10 hit the WouldExceedBlockMaxLimit. And if those 10 accounts cant be executed because of write lock contention, then you probably prematurely dropped tx's the could have actually be included.

I just started looking into this because there are a lot of dropped tx during volatile times: https://github.com/Block-Logic/dropped_solana_txs

But my understanding of scheduler is limited so this could easily be wrong

@tao-stones
Copy link
Contributor

Right, so the 11th TX will be tried after first batch of 10 TXs failed to lock accounts, then free up block space.

This issue is to free up space quicker -- as soon as those TXs failed to grab lock, space reserved for them will be freed for queued txs

@jstarry
Copy link
Contributor Author

jstarry commented Jan 18, 2024

Might it be better if it filters out tx that are already locked before checking the max block cu limits?

I'll create a new issue for this, thanks for raising!

@jstarry
Copy link
Contributor Author

jstarry commented Jan 19, 2024

Superseded by #34825

@jstarry jstarry closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants