You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we retry "Operation cannot be fulfilled on resourcequotas "resource-quota": the object has been modified; please apply your changes to the latest version and try again" because we know it is a common issue on UK AISI's cluster which can reliably be solved by just retrying.
Should we retry other types of failure? E.g. on "quota exceeded" #30 or "context deadline exceeded" errors.
I'm against a blanket retry policy as this can mean that the user needs to wait longer to discover an issue (especially if we add back off) such as quota exceeded or they tried to reference an image which doesn't exist or their container is crashing.
I'm against retrying errors which we don't understand well, because it can mask underlying issues
But if we understand certain errors well, believe they can be overcome by retrying, and can reliably recognise them (e.g. with regex) then I'm all for it.
The text was updated successfully, but these errors were encountered:
Currently, we retry "Operation cannot be fulfilled on resourcequotas "resource-quota": the object has been modified; please apply your changes to the latest version and try again" because we know it is a common issue on UK AISI's cluster which can reliably be solved by just retrying.
Should we retry other types of failure? E.g. on "quota exceeded" #30 or "context deadline exceeded" errors.
I'm against a blanket retry policy as this can mean that the user needs to wait longer to discover an issue (especially if we add back off) such as quota exceeded or they tried to reference an image which doesn't exist or their container is crashing.
I'm against retrying errors which we don't understand well, because it can mask underlying issues
But if we understand certain errors well, believe they can be overcome by retrying, and can reliably recognise them (e.g. with regex) then I'm all for it.
The text was updated successfully, but these errors were encountered: