-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tpu-client-next: add mechanism to implement custom scheduler #4436
tpu-client-next: add mechanism to implement custom scheduler #4436
Conversation
2f4a82b
to
727996d
Compare
} | ||
} | ||
} | ||
Broadcaster::send_to_workers(&mut workers, fanout_leaders, transaction_batch).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handing errors in async code is a bit tricky.
Here you are saying that if send_to_workers()
fails, you are just going to return the error to the caller.
Just want to make sure that this is correct behavior.
Considering that under normal flow there are these calls, that are executed when the worker scheduler is terminating:
workers.shutdown().await;
endpoint.close(0u32.into(), b"Closing connection");
leader_updater.stop().await;
Are you sure they should not be executed if send_to_workers()
fails for any reason?
If not, you may want to store an error, terminate the loop, run the shutdown code and only then return the error to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the best strategy would be to break the loop in this case. Add this to the new trait documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilya-bobyr what about this part? I added a last_error
which holds the value of the error if it happen during send_to_workers
. Added also to the trait documentation clarification that send_to_workers
errors are critical meaning that they stop the scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I would probably call it send_to_workers_err
or something like that.
I understand that last
in the name comes from the fact that you are calling the send function in the loop, and you are talking about the fact that it is the error observed on the last loop iteration.
But at first it was not clear to me, what the name means.
8e83062
to
1c17270
Compare
fd0b83b
to
7488ce8
Compare
7488ce8
to
bd4b8fb
Compare
Problem
The current scheduler implementation works for SendTransactionService or similar applications where the client checks if the transaction has been added to the block and retries when necessary. But for some other applications, like transaction-bench client, we want to have a custom strategy. For example, a combination of try_send and send to some future leaders.
This PR introduces trait that implements most of the logic except one method
send_to_workers
, which should be implemented by user.For the sake of backport simplicity, should be added after #4454
Summary of Changes