-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transaction performance tracking -- streamer stage #257
transaction performance tracking -- streamer stage #257
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #257 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 837 838 +1
Lines 226792 226910 +118
=========================================
+ Hits 185795 185878 +83
- Misses 40997 41032 +35 |
56835a9
to
bb20eeb
Compare
* transaction performance tracking -- streamer stage (cherry picked from commit 2b03910) # Conflicts: # Cargo.toml # streamer/src/nonblocking/quic.rs # streamer/src/quic.rs
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
* transaction performance tracking -- streamer stage (cherry picked from commit 2b03910) # Conflicts: # Cargo.toml
* transaction performance tracking -- streamer stage (cherry picked from commit 2b03910) # Conflicts: # Cargo.toml # streamer/src/nonblocking/quic.rs # streamer/src/quic.rs
* transaction performance tracking -- streamer stage (cherry picked from commit 2b03910) # Conflicts: # Cargo.toml
One one of your PR's doing similar things, I had a comment offering an alternative approach but you must not have seen it: I did end up building this branch, and I instrumented out the TVU path. I think this approach is better because we get every packet and this approach easily scales to doing the same counting down the processing path. Ie, you can see that I measure the age at 2 spots in my branch. What do you think about this approach @lijunwangs ? Here is that branch that does this for TVU path; I was reporting metrics internally for a node: |
I thought about the approach as well -- it is simpler. But I had two concerns -- 1. it changed the public Meta struct declared in the SDK and case compatibility issues. 2. The extra storage overhead of storing this information in every packet we handle through the stages while we are only interested in certain samples of the packets. Beside, this change does not preclude adopting your proposal if we have a consensus doing so in the future. The changes I have here is very limited to streamer itself. If we were to change to using Meta to store the information, we could simply change the storage from PacketAccumulator to it. That what I meant by we can iterate it. For now we really need to capture some information in the streamer layer on the latency this fills the critical gap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing this first PR before reviewing the backports. Left some nit comments. I have a PR up to address these: #648.
@@ -718,6 +732,32 @@ async fn packet_batch_sender( | |||
} | |||
} | |||
|
|||
fn track_streamer_fetch_packet_performance( | |||
packet_perf_measure: &mut [([u8; 64], Instant)], | |||
stats: &Arc<StreamStats>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Passing &Arc<T>
is often an anti-pattern. For this case, I believe we should use &T
instead.
let mut process_sampled_packets_us_hist = stats.process_sampled_packets_us_hist.lock().unwrap(); | ||
|
||
for (signature, start_time) in packet_perf_measure.iter() { | ||
let duration = Instant::now().duration_since(*start_time); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could hoist the Instant::now()
call to a local variable above the for-loop and skip having to do the syscall each iteration. This would change the values on duration
, but I don't think by much. Also, it may be better to have all measurements compared to a single "now" value. Wdyt?
let mut measure = Measure::start("track_perf"); | ||
let mut process_sampled_packets_us_hist = stats.process_sampled_packets_us_hist.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Measurement is unbalance here. We do measure the lock time, we do not measure the unlock time. IMO either both or none should be measured.
* transaction performance tracking -- streamer stage (cherry picked from commit 2b03910)
* transaction performance tracking -- streamer stage (cherry picked from commit 2b03910)
Problem
Enable the system to track transaction processing performance through various stage based on probability. This is the resubmit of the PR solana-labs#34789 against Agave for the first part: streamer only to reduce PR size
Summary of Changes
Based on the https://docs.google.com/document/d/1ig1rC0dk-ddi33JIqG9EZ4ZSq9xAJpT9fQTPaZFi_vw/edit.
We use a randomly generated 12 bits integer as a mask to match the transaction's signature. If it is matched, we mark the packet for tracking for performance in the Meta's mask. This is used efficiently down stream without doing mask matching. For these matched packets we report processing time for: fetch, sigverify and banking stage.