-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Track transaction performance through various stage using random mask #34789
Changes from all commits
c6cc9d9
792095c
80a510e
d2a5bdf
921c9f5
15eb1cb
f252ff1
4d1e1cc
b12a19e
caf6a3c
ee7a5e2
d566aa0
6a4f974
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -208,6 +208,32 @@ impl Consumer { | |
.slot_metrics_tracker | ||
.increment_retryable_packets_count(retryable_transaction_indexes.len() as u64); | ||
|
||
// Now we track the performance for the interested transactions which is not in the retryable_transaction_indexes | ||
// We assume the retryable_transaction_indexes is already sorted. | ||
let mut retryable_idx = 0; | ||
for (index, packet) in packets_to_process.iter().enumerate() { | ||
if packet.original_packet().meta().is_perf_track_packet() { | ||
if let Some(start_time) = packet.start_time() { | ||
if retryable_idx >= retryable_transaction_indexes.len() | ||
|| retryable_transaction_indexes[retryable_idx] != index | ||
{ | ||
let duration = Instant::now().duration_since(*start_time); | ||
|
||
debug!( | ||
"Banking stage processing took {duration:?} for transaction {:?}", | ||
packet.transaction().get_signatures().first() | ||
); | ||
Comment on lines
+222
to
+225
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is really misleading. It didn't take that amount of time to process this transaction. It took that time to process the batch of transactions. |
||
payload | ||
.slot_metrics_tracker | ||
.increment_process_sampled_packets_us(duration.as_micros() as u64); | ||
} else { | ||
// This packet is retried, advance the retry index to the next, as the next packet's index will | ||
// certainly be > than this. | ||
retryable_idx += 1; | ||
} | ||
} | ||
} | ||
} | ||
Some(retryable_transaction_indexes) | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -244,6 +244,9 @@ pub(crate) struct ProcessPacketsTimings { | |
// Time spent running the cost model in processing transactions before executing | ||
// transactions | ||
pub cost_model_us: u64, | ||
|
||
// banking stage processing time histogram for sampled packets | ||
pub process_sampled_packets_us_hist: histogram::Histogram, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will almost certainly have between 0-2 counts per block on mnb, meaning it will probably be so noisy as to be useless. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we care much more about time from sigverify to banking stage picking up the packet from channel, i.e. "time-to-scheduler" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The sampling mechanism goal was to sample the system without tracking everything. Even a couple of data points per slot over long time time can still provide insight into where the time is spent over various stage. Time to scheduler is good stats to have. I will defer to future PRs to reduce this change set. |
||
} | ||
|
||
impl ProcessPacketsTimings { | ||
|
@@ -264,6 +267,28 @@ impl ProcessPacketsTimings { | |
i64 | ||
), | ||
("cost_model_us", self.cost_model_us, i64), | ||
( | ||
"process_sampled_packets_us_90pct", | ||
self.process_sampled_packets_us_hist | ||
.percentile(90.0) | ||
.unwrap_or(0), | ||
i64 | ||
), | ||
( | ||
"process_sampled_packets_us_min", | ||
self.process_sampled_packets_us_hist.minimum().unwrap_or(0), | ||
i64 | ||
), | ||
( | ||
"process_sampled_packets_us_max", | ||
self.process_sampled_packets_us_hist.maximum().unwrap_or(0), | ||
i64 | ||
), | ||
( | ||
"process_sampled_packets_us_mean", | ||
self.process_sampled_packets_us_hist.mean().unwrap_or(0), | ||
i64 | ||
), | ||
); | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there somewhere that we can put this where we aren't adding an iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see an easy way to do that -- I would like avoid smudge on the SanitizedTransaction. But I think I can make this more efficient -- I do not need the binary search on the retryable index, I could do it in one simple loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the actual iterations not like one stack frame deeper in most cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done many levels down using SanitizedTransaction which does not have information about the
start_time
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs to be done elsewhere; this is never called with the new scheduler (for non-votes), so we'd never get any metrics if that's enabled. By the time this change gets in, it will be the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apfitzge what is the new scheduler's name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to @apfitzge offline, will address both issue raised by Trent and Andrew.