-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Track transaction performance through various stage using random mask #34789
Track transaction performance through various stage using random mask #34789
Conversation
baed896
to
45efd53
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #34789 +/- ##
=========================================
- Coverage 81.8% 81.7% -0.1%
=========================================
Files 834 835 +1
Lines 224815 224966 +151
=========================================
+ Hits 183919 184006 +87
- Misses 40896 40960 +64 |
4d4ba9a
to
0991665
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you run some benchmarks with and without this tracking. it seems extremely likely to add detrimental overhead
@@ -206,6 +206,26 @@ impl Consumer { | |||
.slot_metrics_tracker | |||
.increment_retryable_packets_count(retryable_transaction_indexes.len() as u64); | |||
|
|||
// Now we track the performance for the interested transactions which is not in the retryable_transaction_indexes | |||
// We assume the retryable_transaction_indexes is already sorted. | |||
for (index, packet) in packets_to_process.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there somewhere that we can put this where we aren't adding an iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see an easy way to do that -- I would like avoid smudge on the SanitizedTransaction. But I think I can make this more efficient -- I do not need the binary search on the retryable index, I could do it in one simple loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the actual iterations not like one stack frame deeper in most cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done many levels down using SanitizedTransaction which does not have information about the start_time
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs to be done elsewhere; this is never called with the new scheduler (for non-votes), so we'd never get any metrics if that's enabled. By the time this change gets in, it will be the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apfitzge what is the new scheduler's name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to @apfitzge offline, will address both issue raised by Trent and Andrew.
I do not see any material differences with/without the change using bench-tps: Rpc-client with fix lijun_solana_com@lijun-dev8:~/solana$ ./cargo run --release --bin solana-bench-tps -- -u http://35.233.177.221:8899 --identity ~/.config/solana/id.json --tx_count 1000 --thread-batch-sleep-ms 0 -t 20 --duration 60 -n 35.233.177.221:8001 --read-client-keys ~/gce-keypairs.yaml --use-rpc-client
[2024-01-30T09:42:05.249223482Z INFO solana_bench_tps::bench] Average TPS: 7171.7437
[2024-01-30T09:43:53.539368674Z INFO solana_bench_tps::bench] Average TPS: 8209.673
[2024-01-30T18:05:42.712776385Z INFO solana_bench_tps::bench] Average TPS: 6204.4365
[2024-01-30T18:30:48.060532674Z INFO solana_bench_tps::bench] Average TPS: 4780.393
[2024-01-31T00:34:06.877848441Z INFO solana_bench_tps::bench] Average TPS: 6120.311 Master without change:
[2024-01-31T02:12:00.852692886Z INFO solana_bench_tps::bench] Average TPS: 8523.614
[2024-01-31T02:21:19.220559104Z INFO solana_bench_tps::bench] Average TPS: 5904.669
[2024-01-31T19:02:38.746203621Z INFO solana_bench_tps::bench] Average TPS: 4400.843
[2024-01-31T19:04:17.896044976Z INFO solana_bench_tps::bench] Average TPS: 6338.43
[2024-01-31T19:06:34.395401402Z INFO solana_bench_tps::bench] Average TPS: 6257.952 |
148feaf
to
2207cb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think benchtps is how we want to be testing this. it's too far away and is biased by everything after the scheduler, that we don't care about. i think we should be using the benchmarks in perf, sigverify and banking instead. i'm especially concerned about whether we negatively impacting our capacity in front of sigverify load shedding
@@ -206,6 +206,26 @@ impl Consumer { | |||
.slot_metrics_tracker | |||
.increment_retryable_packets_count(retryable_transaction_indexes.len() as u64); | |||
|
|||
// Now we track the performance for the interested transactions which is not in the retryable_transaction_indexes | |||
// We assume the retryable_transaction_indexes is already sorted. | |||
for (index, packet) in packets_to_process.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the actual iterations not like one stack frame deeper in most cases?
2207cb2
to
a8868ab
Compare
debug!( | ||
"Banking stage processing took {duration:?} for transaction {:?}", | ||
packet.transaction().get_signatures().first() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really misleading. It didn't take that amount of time to process this transaction. It took that time to process the batch of transactions.
@@ -244,6 +244,9 @@ pub(crate) struct ProcessPacketsTimings { | |||
// Time spent running the cost model in processing transactions before executing | |||
// transactions | |||
pub cost_model_us: u64, | |||
|
|||
// banking stage processing time histogram for sampled packets | |||
pub process_sampled_packets_us_hist: histogram::Histogram, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will almost certainly have between 0-2 counts per block on mnb, meaning it will probably be so noisy as to be useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we care much more about time from sigverify to banking stage picking up the packet from channel, i.e. "time-to-scheduler"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sampling mechanism goal was to sample the system without tracking everything. Even a couple of data points per slot over long time time can still provide insight into where the time is spent over various stage. Time to scheduler is good stats to have. I will defer to future PRs to reduce this change set.
trying txn mask matching output txn to figure out why txn is not exactly matched Use 62 and 61 portion track fetch performance using random txn mask track sigverify performance using random txn mask track banking stage performance using random txn mask adding missing cargo lock file add debug messages Revert "add debug messages" This reverts commit 96aead5. fixed some clippy issues check-crate issue Fix a clippy issue Fix a clippy issue debug why txns in banking stage shows fewer performance tracking points debug why txns in banking stage shows fewer performance tracking points debug why txns in banking stage shows fewer performance tracking points debug why txns in banking stage shows fewer performance tracking points get higher PPS for testing purpose more debug messages on why txn is skipped display if tracer packet in log add debug before calling processing_function debug at the initial of banking stage track if a txn is forwarded dependency order missing cargo file clean up debug messages Do not use TRACER_PACKET, use its own bit rename some functions addressed some comments from Trent Update core/src/banking_stage/immutable_deserialized_packet.rs Co-authored-by: Trent Nelson <[email protected]> addressed some comments from Trent Do not use binary_search, do simple compare in one loop
697e141
to
d566aa0
Compare
This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave |
Problem
Enable the system to track transaction processing performance through various stage based on probability.
Summary of Changes
Based on the https://docs.google.com/document/d/1ig1rC0dk-ddi33JIqG9EZ4ZSq9xAJpT9fQTPaZFi_vw/edit.
We use a randomly generated 12 bits integer as a mask to match the transaction's signature. If it is matched, we mark the packet for tracking for performance in the Meta's mask. This is used efficiently down stream without doing mask matching. For these matched packets we report processing time for: fetch, sigverify and banking stage.
Fixes #