Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider retrying more types of helm install failures #34

Open
craigwalton-dsit opened this issue Dec 20, 2024 · 0 comments
Open

Consider retrying more types of helm install failures #34

craigwalton-dsit opened this issue Dec 20, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@craigwalton-dsit
Copy link
Collaborator

craigwalton-dsit commented Dec 20, 2024

Currently, we retry "Operation cannot be fulfilled on resourcequotas "resource-quota": the object has been modified; please apply your changes to the latest version and try again" because we know it is a common issue on UK AISI's cluster which can reliably be solved by just retrying.

Should we retry other types of failure? E.g. on "quota exceeded" #30 or "context deadline exceeded" errors.

I'm against a blanket retry policy as this can mean that the user needs to wait longer to discover an issue (especially if we add back off) such as quota exceeded or they tried to reference an image which doesn't exist or their container is crashing.

I'm against retrying errors which we don't understand well, because it can mask underlying issues

But if we understand certain errors well, believe they can be overcome by retrying, and can reliably recognise them (e.g. with regex) then I'm all for it.

@craigwalton-dsit craigwalton-dsit added the enhancement New feature or request label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant