Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout to channel builder #710

Open
wants to merge 3 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ a| `CloudSSLCertificateNotValidated`
a| `CloudTokenCredentialInvalid`
a| `ConnectionFailed`
a| `ConnectionIsClosed`
a| `ConnectionTimedOut`
a| `DatabaseDoesNotExist`
a| `InvalidResponseField`
a| `MissingPort`
Expand Down
4 changes: 4 additions & 0 deletions rust/src/common/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ error_messages! { ConnectionError
23: "Invalid URL '{address}': missing port.",
AddressTranslationMismatch { unknown: HashSet<Address>, unmapped: HashSet<Address> } =
24: "Address translation map does not match the server's advertised address list. User-provided servers not in the advertised list: {unknown:?}. Advertised servers not mapped by user: {unmapped:?}.",
ConnectionTimedOut =
25: "Connection to the server timed out."
}

error_messages! { InternalError
Expand Down Expand Up @@ -196,6 +198,8 @@ impl From<Status> for Error {
Self::Connection(ConnectionError::ServerConnectionFailedStatusError { error: status.message().to_owned() })
} else if status.code() == Code::Unimplemented {
Self::Connection(ConnectionError::RPCMethodUnavailable { message: status.message().to_owned() })
} else if status.code() == Code::Cancelled && status.message() == "Timeout expired" {
Self::Connection(ConnectionError::ConnectionTimedOut)
} else {
Self::from_message(status.message())
}
Expand Down
11 changes: 8 additions & 3 deletions rust/src/connection/network/channel.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@
* under the License.
*/

use std::sync::{Arc, RwLock};
use std::{
sync::{Arc, RwLock},
time::Duration,
};

use tonic::{
body::BoxBody,
Expand Down Expand Up @@ -49,8 +52,10 @@ impl GRPCChannel for PlainTextChannel {}

impl GRPCChannel for CallCredChannel {}

const TIMEOUT: Duration = Duration::from_secs(60);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chosen arbitrarily. Anything less than a second or so is too strict for slow connections, but going significantly over a minute would probably be too long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's verify that this timeout doesn't drop query requests with complicated, long-running queries. For example, write a simple read query, put a sleep on the server side for 2 minutes, and let it run. This should not fail. But if you put your server sleep on connection, it should fail after 1 minute.

If this works, everything should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query does time out if the server does not respond within this time (tested by adding a sleep(60000) in TransactionService::respond() on the server side).

The default transaction timeout is 5 minutes. I'd extend this timeout to 10 minutes or even an hour.

Ideally we'd have a connection builder that would let the user specify the desired request timeout, but that would take a lot longer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that we'll just destroy some of the usages of analytics queries with this update. I'd suggest giving the query at least 2 hours to execute. And, ideally, highlight this behavior somewhere in the docs.

Later, it will be ideal if this timeout is configurable. Let's create a good task for it if we're going to close the original bug.


pub(super) fn open_plaintext_channel(address: Address) -> PlainTextChannel {
PlainTextChannel::new(Channel::builder(address.into_uri()).connect_lazy(), PlainTextFacade)
PlainTextChannel::new(Channel::builder(address.into_uri()).timeout(TIMEOUT).connect_lazy(), PlainTextFacade)
}

#[derive(Clone, Debug)]
Expand All @@ -66,7 +71,7 @@ pub(super) fn open_callcred_channel(
address: Address,
credential: Credential,
) -> Result<(CallCredChannel, Arc<CallCredentials>)> {
let mut builder = Channel::builder(address.into_uri());
let mut builder = Channel::builder(address.into_uri()).timeout(TIMEOUT);
if credential.is_tls_enabled() {
builder = builder.tls_config(credential.tls_config().clone().unwrap())?;
}
Expand Down