-
-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeline panic #208
Comments
Hello @derekperkins, I couldn't manage to reproduce the problem. It's very likely there is a nil item in the list of commands. func (dp *DMapPipeline) execOnPartition(ctx context.Context, partID uint64) error {
rc, err := dp.dm.clusterClient.clientByPartID(partID)
if err != nil {
return err
}
// There is no need to protect dp.commands map and its content.
// It's already filled before running Exec, and it's now a read-only
// data structure
commands := dp.commands[partID]
pipe := rc.Pipeline()
for _, cmd := range commands {
pipe.Do(ctx, cmd.Args()...) --> panics
} Could you share a code snippet to reproduce the problem? |
Here is my hypothesis:
|
Yes, we're using Discard and reusing pipelines, but we're not calling Exec and Discard simultaneously on our own accord. This is simplified, but basically this is what we're doing. defer pipeline.Discard()
pipeline.Exec(ctx)
for _, futureGet := range futureGets {
futureGet.Result().Scan()
} I'm wondering if there's a race condition when context is canceled... |
I believe the possible root cause is this. |
Same issue as #256 - linking for ref. I believe it is indeed context cancellation. Lines 522 to 525 in 6ca0e20
So context can be cancelled leading to Exec exiting while execOnPartition is still running. I don't think this is a great behavior, think it'd be better to maintain sync aspect of Exec by either waiting here: err := sem.Acquire(ctx, 1)
if err != nil {
// Wait for goroutines to complete
_ = errGr.Wait()
return err
} or alternatively handling cancellation in execOnPartition |
This has panicked a couple times in the last hour or so running in prod
The text was updated successfully, but these errors were encountered: