Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline panic #208

Open
derekperkins opened this issue Jan 12, 2023 · 5 comments
Open

pipeline panic #208

derekperkins opened this issue Jan 12, 2023 · 5 comments
Assignees
Labels
bug Something isn't working dmap

Comments

@derekperkins
Copy link
Contributor

This has panicked a couple times in the last hour or so running in prod

 panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1ac3437]

goroutine 2571 [running]:
github.com/buraksezer/olric.(*DMapPipeline).execOnPartition(0xc000621800, {0x29e4ba8, 0xc001650540}, 0xc000c998e0?)
	/Users/derek/go/pkg/mod/github.com/buraksezer/[email protected]/pipeline.go:483 +0x317
github.com/buraksezer/olric.(*DMapPipeline).Exec.func1()
	/Users/derek/go/pkg/mod/github.com/buraksezer/[email protected]/pipeline.go:532 +0x6e
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/Users/derek/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x64
@buraksezer buraksezer self-assigned this Jan 12, 2023
@buraksezer buraksezer added bug Something isn't working dmap labels Jan 12, 2023
@buraksezer
Copy link
Owner

Hello @derekperkins,

I couldn't manage to reproduce the problem. It's very likely there is a nil item in the list of commands. pipe cannot be nil. I reviewed the code. I couldn't find any reason for nil commands. There can be a subtle bug in the command pool logic. Do you use Discard?

func (dp *DMapPipeline) execOnPartition(ctx context.Context, partID uint64) error {
	rc, err := dp.dm.clusterClient.clientByPartID(partID)
	if err != nil {
		return err
	}
	// There is no need to protect dp.commands map and its content.
	// It's already filled before running Exec, and it's now a read-only
	// data structure
	commands := dp.commands[partID]
	pipe := rc.Pipeline()

	for _, cmd := range commands {
		pipe.Do(ctx, cmd.Args()...) --> panics
	}

Could you share a code snippet to reproduce the problem?

@buraksezer
Copy link
Owner

Here is my hypothesis:

execOnPartition assumes that dp.commands is read-only during the execution. Neither dp.commands nor its content is protected by a mutex during the execution. If your code calls Exec and Discard simultaneously, execOnPartition may encounter nil items in dp.commands.

Discard calls dp.closedCancel() at line 548, if Exec is already passed the context check at line 503, they can start running parallel.

@derekperkins
Copy link
Contributor Author

Yes, we're using Discard and reusing pipelines, but we're not calling Exec and Discard simultaneously on our own accord. This is simplified, but basically this is what we're doing.

defer pipeline.Discard()

pipeline.Exec(ctx)

for _, futureGet := range futureGets {
    futureGet.Result().Scan()
}

I'm wondering if there's a race condition when context is canceled...

@buraksezer
Copy link
Owner

I'm wondering if there's a race condition when context is canceled...

I believe the possible root cause is this.

@david-sharechat
Copy link

david-sharechat commented Oct 23, 2024

Same issue as #256 - linking for ref.

I believe it is indeed context cancellation.
Exec is synchronous except for

olric/pipeline.go

Lines 522 to 525 in 6ca0e20

err := sem.Acquire(ctx, 1)
if err != nil {
return err
}

So context can be cancelled leading to Exec exiting while execOnPartition is still running.

I don't think this is a great behavior, think it'd be better to maintain sync aspect of Exec by either waiting here:

err := sem.Acquire(ctx, 1)
if err != nil {
    // Wait for goroutines to complete
    _ = errGr.Wait()
	return err
}

or alternatively handling cancellation in execOnPartition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dmap
Projects
None yet
Development

No branches or pull requests

3 participants