Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pubsub): Implement Ping/Pong Mechanism to Improve Connection Reliability #3845

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

0xIchigo
Copy link

Problem

Both the blocking and nonblocking PubsubClient clients currently lack a mechanism to detect when the WebSocket connection becomes unresponsive or stale. Without periodic health checks, the clients may not realize that the server is no longer responding. This can lead to missed messages and unreliable subscriptions. This can affect applications relying on WebSockets to provide real-time data streams

Summary of Changes

This PR aims to implement a ping/pong mechanism in both the blocking and nonblocking PubsubClient clients to improve connection reliability

Changes

  • Introduced a periodic Ping message sent to the server using tokio::time::interval
  • Added handling for incoming Pong messages and reset the unmatched_pings counter upon receiving any message
  • Configured the client to close the connection gracefully if the unmatched_pings counter exceeds DEFAULT_MAX_FAILED_PINGS
  • Ensured the changes did not significantly alter the existing code structure. This meant hardcoding DEFAULT_PING_DURATION_SECONDS and DEFAULT_MAX_FAILED_PINGS, as making them configurable would've meant changing the existing subscription parameters resulting in a breaking change. However, these params could easily be changed in the future to become configurable—the importance here is introducing reliable health checks to WebSocket connections

@mergify mergify bot requested a review from a team November 30, 2024 07:39
Copy link

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @0xIchigo, this seems pretty close. I have a couple small comments, and we'll have to see how CI fares.
(edit) For starters, CI is complaining about trailing whitespace that needs to be removed.

While you're making updates, can you please rebase on master to remove all the merge commits and force-push?

pubsub-client/src/pubsub_client.rs Show resolved Hide resolved
pubsub-client/src/nonblocking/pubsub_client.rs Outdated Show resolved Hide resolved
pubsub-client/src/pubsub_client.rs Outdated Show resolved Hide resolved
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Jan 7, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Jan 7, 2025
@0xIchigo 0xIchigo force-pushed the feat/pubsub-ping-pong-checks branch from 021171c to 6b00e49 Compare January 8, 2025 21:29
Copy link

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to run CI on this yet, as I can see it will not compile. Please fix compilation errors, thanks.

pubsub-client/src/nonblocking/pubsub_client.rs Outdated Show resolved Hide resolved
pubsub-client/src/pubsub_client.rs Outdated Show resolved Hide resolved
pubsub-client/src/pubsub_client.rs Show resolved Hide resolved
@0xIchigo 0xIchigo force-pushed the feat/pubsub-ping-pong-checks branch from 6fd9c76 to f05a77d Compare January 9, 2025 17:13
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Jan 9, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Jan 9, 2025
@CriesofCarrots
Copy link

Code does not compile; see CI.
Please run cargo build (and fmt and clippy) locally before pinging for review.

@0xIchigo 0xIchigo force-pushed the feat/pubsub-ping-pong-checks branch from 910db8e to f68a307 Compare January 10, 2025 23:13
@0xIchigo
Copy link
Author

My apologies @CriesofCarrots, I was able to fix all my build issues and run the code locally. All the compile errors should be fixed now!

@0xIchigo 0xIchigo force-pushed the feat/pubsub-ping-pong-checks branch from f68a307 to 7daa436 Compare January 13, 2025 15:54
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Jan 13, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Jan 13, 2025
@0xIchigo 0xIchigo force-pushed the feat/pubsub-ping-pong-checks branch from 7daa436 to ae146f6 Compare January 13, 2025 21:31
@0xIchigo
Copy link
Author

Sorry @CriesofCarrots—I rebased the PR to ensure it was in sync with the repo but noticed that another workflow approval is required for the tests to run. Is there anything else that needs to be changed to merge this PR, or are all the health checks good?

@CriesofCarrots
Copy link

CriesofCarrots commented Jan 14, 2025

Sorry @CriesofCarrots—I rebased the PR to ensure it was in sync with the repo but noticed that another workflow approval is required for the tests to run. Is there anything else that needs to be changed to merge this PR, or are all the health checks good?

Yes @0xIchigo, cargo clippy is making this complaint:

error: you seem to be trying to use `match` for destructuring a single pattern. Consider using `if let`
   --> pubsub-client/src/pubsub_client.rs:843:13
    |
843 | /             match maybe_tls_stream {
844 | |                 MaybeTlsStream::Plain(tcp_stream) => {
845 | |                     if let Err(e) = tcp_stream.set_read_timeout(Some(Duration::from_millis(500))) {
846 | |                         info!("Failed to set read timeout on TcpStream: {:?}", e);
...   |
850 | |                 _ => {}
851 | |             }
    | |_____________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#single_match
    = note: `-D clippy::single-match` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::single_match)]`
help: try
    |
843 ~             if let MaybeTlsStream::Plain(tcp_stream) = maybe_tls_stream {
844 +                 if let Err(e) = tcp_stream.set_read_timeout(Some(Duration::from_millis(500))) {
845 +                     info!("Failed to set read timeout on TcpStream: {:?}", e);
846 +                 }
847 +             }
    |
error: could not compile `solana-pubsub-client` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `solana-pubsub-client` (lib test) due to 1 previous error
🚨 Error: The command exited with status 101

@0xIchigo 0xIchigo force-pushed the feat/pubsub-ping-pong-checks branch 2 times, most recently from 34394b7 to d940b27 Compare January 15, 2025 06:17
@0xIchigo 0xIchigo force-pushed the feat/pubsub-ping-pong-checks branch from d940b27 to 7d79751 Compare January 15, 2025 16:44
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Jan 15, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Jan 15, 2025
@CriesofCarrots
Copy link

@0xIchigo , there seems to be an issue with this code as per our CI test suite. Subscription tests are hanging. Can you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants