This repository has been archived by the owner on Feb 8, 2024. It is now read-only.
Add very basic version of job unstuck-ing for non-txn jobs that hang … #57
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…in 'running'
This is a v1 implementation so we don't forget to have something. The query checks for jobs that have been
running
for over 2 minutes and puts them back toavailable
(the action that made themrunning
already added anattempt
).Porting all of our retry time calculation logic into SQL didn't seem worth it, nor did
SELECT
ing all the rows, doing the exact retry-time calculation per row in Rust, and updating them individually. We shouldn't have jobs getting stuck unless pods crash, so this isn't the common path for retries.