-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransactionView: ReceiveAndBuffer #3820
TransactionView: ReceiveAndBuffer #3820
Conversation
2876a3f
to
c868ff5
Compare
type Transaction = RuntimeTransaction<ResolvedTransactionView<Bytes>>; | ||
type Container = TransactionViewStateContainer; | ||
|
||
fn receive_and_buffer_packets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: this does not run the bank.check_transactions
because I prioritized safety around not leaking Bytes
in the container, which is much trickier if we do batching.
We should probably follow this up with a call to clean immediately after, so that we check age on transactions we actually inserted, which is likely better anyway. However, I'd like to profile with and without that change first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also should make sure we are checking the status cache to make sure we're not scheduling transactions that have already been processed right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bank::check_transactions
checks the age and status cache. so it'll clean out both types before we get to the processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry my comment wasn't clear. I bring it up because it's kind of a bug to not check status cache before scheduling right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do check before scheduling. I'm just not checking before buffering in this PR.
I think with some of the other clean ups we did during review it may be easier to add in. Let me give it a shot this afternoon, and I'll comment here on what I got.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, taking a look at this. I think we can do it relatively easily during buffering; but it may require some refactoring to do it in the optimal point.
We'd really want to check age/status immediately after we calculate the message hash; but the type we have at that point RuntimeTransaction<SanitizedTransactionView>
doesn't implement SVMMessage
(addresses not resolved).
That way we don't need to reach into accounts-db (slower) before we discard too old or already processed txs.
I think it's probably better to refactor the internals of check_transactions
so we can call it as long as we have the signature/message hash (for SVMMessage we can have fn that just grabs those and calls the actual impl).
^ Given those additional changes, think will leave it as a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me
if container.is_empty() | ||
&& matches!( | ||
decision, | ||
BufferedPacketsDecision::Forward | BufferedPacketsDecision::ForwardAndHold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed Hold
branch here. Sometimes after a leader slot, we do not have a bank ready in Consume
for our next leader slot. We should still avoid the recv_timeout
in that case. Also added a check that container is not empty - if container has items, then we should try to process them instead of sleeping for packets
f87bd2f
to
2d9cc88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from SVM point of view.
2d9cc88
to
e310dd7
Compare
4e927a9
to
a8c7a41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major qualms with what you have. Left a few comments to consider
core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing but posting some feedback along the way
core/src/banking_stage/transaction_scheduler/transaction_state.rs
Outdated
Show resolved
Hide resolved
core/src/banking_stage/transaction_scheduler/transaction_state_container.rs
Outdated
Show resolved
Hide resolved
core/src/banking_stage/transaction_scheduler/transaction_state_container.rs
Outdated
Show resolved
Hide resolved
core/src/banking_stage/transaction_scheduler/transaction_state_container.rs
Outdated
Show resolved
Hide resolved
core/src/banking_stage/transaction_scheduler/transaction_state_container.rs
Show resolved
Hide resolved
core/src/banking_stage/transaction_scheduler/transaction_state_container.rs
Outdated
Show resolved
Hide resolved
core/src/banking_stage/transaction_scheduler/prio_graph_scheduler.rs
Outdated
Show resolved
Hide resolved
6ccd99a
to
19fb288
Compare
fn receive_and_buffer_packets( | ||
&mut self, | ||
container: &mut Self::Container, | ||
timing_metrics: &mut SchedulerTimingMetrics, | ||
count_metrics: &mut SchedulerCountMetrics, | ||
decision: &BufferedPacketsDecision, | ||
) -> bool; | ||
) -> Result<usize, ()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was to improve the testing which was seeing some inconsistent failures, due to timing out in this call.
Those tests now loop until this call returns Ok(0) - or an error.
Natural question - Why did this only pop up for TransactionView
?
The receive_and_buffer timeout for view is done differently than for sdk types.
The sdk types have a receive timeout, i.e. a time limit to receive packets, and then will deserialize all packets it received regardless of the time that takes.
The view type has an overall timeout so that it does not take longer than specified time for all operations - receive AND parsing.
Since receiving takes almost no time/work, this consistently would receive all the batches sent in the sdk
variant. and then it would take however long to deserialize them - whether or not OS scheduler decides to rug the thread doesn't matter. If the view variant of the test gets rugged during parsing, then it can time out and not receive the next message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have different timeout implementations between the two transaction structs? It feels more correct to have the timeout apply to both receive and buffer but it will likely mean that we buffer fewer transactions before processing right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan on also changing the sdk
approach to be timed out on receive AND buffer, since I agree this definitely seems more correct...but only after forwarding and ImmutableDeserializedPacket
is removed; all the current receiving code needs to be reworked but doesn't make sense to rework it until IDP is dead imo.
But i've held off removing forwarding on getting this merged first, since when i posted there was some other stuff that needed to get merged for forwarding removal; in hindsight it may have been the wrong order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for the context. So is it ok that we start processing transactions without buffering so many? Seems like the main loop in the scheduler controller is fairly lightweight besides doing work for the pre_graph_filter
inside process_transactions
. If we end up running more shorter scheduler loops we will still be putting later received transactions into the top of the priograph if they pay high pri fees right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vast majority of packets we receive come in slightly before we become leader.
I had a heatmap of this at one time, and things may have changed. (I should probably grab some trace data and re-generate it!)
There used to be a small steady-state of packets coming in in the minute before the first leader slot. Then around 5-10 slots before becoming leader theres a huge spike of traffic, but it fell off very quickly with much fewer packets arriving during our leader slots.
All to say, it should be fine to run shorter calls to schedule if our loop is fast enough at returning to it. We essentially always want to keep some amount of transactions queued to each worker thread once we are leader, so the calls between schedule should just be enough that the scheduler is able to keep up with how fast the workers process and queue more up as they complete the work.
2b8807d
to
d579b84
Compare
05779cd
to
f2c9d44
Compare
@tao-stones @jstarry if you are back from vacation(s) would really appreciate review on this. |
f2c9d44
to
cee2ff5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
4c4bae1
to
7fe132f
Compare
@pgarg66 adding you to review for @anza-xyz/svm |
Problem
Summary of Changes
TransactionViewReceiveAndBuffer
to allow banking stage to ingestTransactionView
-based transactionsFixes #