-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the backfiller aware of revs and fill gaps #454
Conversation
Hrn... jobs don't recover from error states. Need to figure that one out |
return false, nil | ||
case StateInProgress, StateEnqueued: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we buffering ops on enqueued backfill jobs here? They won't have been checked out yet if they're still enqueued so any events they get will be included in the checkout, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, thats true
func (j *Gormjob) FlushBufferedOps(ctx context.Context, fn func(kind, path string, rec *typegen.CBORMarshaler, cid *cid.Cid) error) error { | ||
var ErrEventGap = fmt.Errorf("buffered event revs did not line up") | ||
|
||
func (j *Gormjob) FlushBufferedOps(ctx context.Context, fn func(kind, path string, rec typegen.CBORMarshaler, cid *cid.Cid) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with deletes? IIRC the rec
being a pointer worked with deletes because it'd be nil
when deleting a record, does it just end up being an empty typegen.CBORMarshaler
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typegen.CBORMarshaler is an interface, it would just be nil
Yeah, the way I typically handle this is by running a query to reset the job state on jobs that failed for a retryable reason but it's not always clear which jobs are safe to retry etc. |
need to figure something out there, maybe just keep a counter of how many times its failed and do automatic retries up to a point? |
would we do retries when we see more events from the repo or somehow enqueue it again for a retry when it fails? would probably also want some kind of backoff in there. |
i think we should automatically re-enqueue it, after some duration |
okay, we probably want two new columns then for the jobs, something like a |
oh the other thing i havent implemented et is the gap fill handling logic, it spits out an error when we detect a gap, and resets the job state, but when we actually go to process that we arent filtering to just the gap records |
cc8fe9a
to
be7d652
Compare
No description provided.