-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry_on_status
setting does not work as expected with requests that should not be retried immediately
#197
Comments
I transferred this issue from elasticsearch-py as retries are implemented in the transport. Implementation itself is simple enough. Without considering tests for now, this diff is enough: diff --git a/elastic_transport/_async_transport.py b/elastic_transport/_async_transport.py
index 1e66f74..20e6d6a 100644
--- a/elastic_transport/_async_transport.py
+++ b/elastic_transport/_async_transport.py
@@ -47,6 +47,7 @@ from ._transport import (
NOT_DEAD_NODE_HTTP_STATUSES,
Transport,
TransportApiResponse,
+ backoff_time,
validate_sniffing_options,
)
from .client_utils import resolve_default
@@ -366,6 +367,7 @@ class AsyncTransport(Transport):
if not retry or attempt >= max_retries:
return TransportApiResponse(resp.meta, body)
else:
+ await asyncio.sleep(backoff_time(attempt))
_logger.warning(
"Retrying request after non-successful status %d (attempt %d of %d)",
resp.meta.status,
diff --git a/elastic_transport/_transport.py b/elastic_transport/_transport.py
index bf1de58..9621806 100644
--- a/elastic_transport/_transport.py
+++ b/elastic_transport/_transport.py
@@ -86,6 +86,10 @@ class TransportApiResponse(NamedTuple):
meta: ApiResponseMeta
body: Any
+def backoff_time(attempts: int, base: int = 1, cap: int = 60):
+ # Full Jitter from https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
+ return random.uniform(0, min(cap, base * 2 ** (attempts - 1)))
+
class Transport:
"""
@@ -444,6 +448,7 @@ class Transport:
if not retry or attempt >= max_retries:
return TransportApiResponse(resp.meta, body)
else:
+ time.sleep(backoff_time(attempt))
_logger.warning(
"Retrying request after non-successful status %d (attempt %d of %d)",
resp.meta.status, However, I have two concerns:
|
For me it might be hard to say - if we're concerned about it, might be worth to make it configurable and make default setting emulate previous behaviour?
I think so. Especially since number of retries is configurable. |
Thanks Artem! I also got feedback from @miguelgrinberg:
I'm planning to make
I'll wait for an user to complain before having retries that sleep and other that don't. It is possible today to configure the status code that are retried and the ones that are not. |
Steps to reproduce:
Happens for me on clusters of 8.12 and 8.13 with clients of same versions, I assume it happens in earlier clusters as well.
bulk
requests that will eventually lead to this cluster returning 429 back.search
on this cluster with Elasticsearch client that is initialised with defaultretry_on_status
- presumably, it should retry 429s.What happens is - client in 3. actually retries 429, but fails almost immediately as it does not try to sleep with any sort of backoff strategy, so it retries a couple times in a row within a second and give up.
Expected behaviour
Client allows to retry with a backoff strategy. Additionally, if the client is closed, would be great if the pending retry immediately wakes up and exits without retrying (see elastic/elasticsearch-py#2484)
The text was updated successfully, but these errors were encountered: