Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job runner: add retry functionality to all SQL operations #293

Open
jechols opened this issue Feb 14, 2024 · 1 comment
Open

Job runner: add retry functionality to all SQL operations #293

jechols opened this issue Feb 14, 2024 · 1 comment

Comments

@jechols
Copy link
Member

jechols commented Feb 14, 2024

When the DB is down for even a few seconds, it can completely hose the job runner, causing lost job logs and preventing jobs from moving from "in process" to "success". Though this is a rare occurrence, it happens occasionally, and manually fixing an orphaned job is very annoying.

The fix is probably to replicate Go's built-in DB.retry functionality, but with a delay between retries, and a larger threshold than the current Go maxBadConnRetries value.

This probably should only affect background jobs - the HTTP server is relatively safe even if there's an outage. Worst case with HTTP requests is somebody gets an immediate notification that they need to try again. When it happens in the job runner, though, there's no real-time way to deal with it, so it just ends up losing logs or getting a job stuck.

@jechols
Copy link
Member Author

jechols commented Feb 14, 2024

This seems like it should almost never happen, but we're finding that the more things we put behind HAProxy, the more likely it is for a configuration to stop all services, even if only for a second or two. A local database would solve this, but that takes away the value of having redundancy at the HAProxy level. Losing one DB head in our current setup doesn't stop the app from continuing - we have to lose all three heads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant