Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpu-client-next: add mechanism to implement custom scheduler #4436

Merged
merged 6 commits into from
Jan 30, 2025

Conversation

KirillLykov
Copy link

@KirillLykov KirillLykov commented Jan 13, 2025

Problem

The current scheduler implementation works for SendTransactionService or similar applications where the client checks if the transaction has been added to the block and retries when necessary. But for some other applications, like transaction-bench client, we want to have a custom strategy. For example, a combination of try_send and send to some future leaders.

This PR introduces trait that implements most of the logic except one method send_to_workers, which should be implemented by user.

For the sake of backport simplicity, should be added after #4454

Summary of Changes

@alessandrod alessandrod changed the title add mechanism to implement custom scheduler tpu-client-next: add mechanism to implement custom scheduler Jan 14, 2025
@KirillLykov KirillLykov force-pushed the klykov/add-extention-mechanism branch from 2f4a82b to 727996d Compare January 27, 2025 20:56
}
}
}
Broadcaster::send_to_workers(&mut workers, fanout_leaders, transaction_batch).await?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handing errors in async code is a bit tricky.

Here you are saying that if send_to_workers() fails, you are just going to return the error to the caller.

Just want to make sure that this is correct behavior.
Considering that under normal flow there are these calls, that are executed when the worker scheduler is terminating:

        workers.shutdown().await;
        endpoint.close(0u32.into(), b"Closing connection");
        leader_updater.stop().await;

Are you sure they should not be executed if send_to_workers() fails for any reason?

If not, you may want to store an error, terminate the loop, run the shutdown code and only then return the error to the caller.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the best strategy would be to break the loop in this case. Add this to the new trait documentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilya-bobyr what about this part? I added a last_error which holds the value of the error if it happen during send_to_workers. Added also to the trait documentation clarification that send_to_workers errors are critical meaning that they stop the scheduler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
I would probably call it send_to_workers_err or something like that.
I understand that last in the name comes from the fact that you are calling the send function in the loop, and you are talking about the fact that it is the error observed on the last loop iteration.
But at first it was not clear to me, what the name means.

tpu-client-next/src/quic_networking.rs Outdated Show resolved Hide resolved
tpu-client-next/src/workers_cache.rs Outdated Show resolved Hide resolved
@KirillLykov KirillLykov force-pushed the klykov/add-extention-mechanism branch from 8e83062 to 1c17270 Compare January 28, 2025 17:51
@KirillLykov KirillLykov force-pushed the klykov/add-extention-mechanism branch from fd0b83b to 7488ce8 Compare January 29, 2025 08:12
ilya-bobyr
ilya-bobyr previously approved these changes Jan 29, 2025
@KirillLykov KirillLykov force-pushed the klykov/add-extention-mechanism branch from 7488ce8 to bd4b8fb Compare January 29, 2025 14:47
@KirillLykov KirillLykov added the automerge automerge Merge this Pull Request automatically once CI passes label Jan 29, 2025
@mergify mergify bot merged commit b63e819 into anza-xyz:master Jan 30, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge automerge Merge this Pull Request automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants