-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow initiating peer to close stream gracefully #5
base: master
Are you sure you want to change the base?
Conversation
I have a usecase where I would like to remove a worker without killing it. Currently, trying to disconnect from the head node will cause the message handler loop to throw a fatal exception, so this adds a check that the connection is still open when trying to read new messages.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5 +/- ##
===========================================
- Coverage 86.20% 25.73% -60.48%
===========================================
Files 10 10
Lines 1965 1916 -49
===========================================
- Hits 1694 493 -1201
- Misses 271 1423 +1152 ☔ View full report in Codecov by Sentry. |
I'm surprised at the seemingly-random CI failures 🤔 Does this also happen with Distributed? |
Hmm, I had hard-to-reproduce failures with these tests before, but I thought I fixed everything. When I use persistent workers interactively at least I don't encounter any of these issues, so it seems to have something to do with the worker process being spawned from another Julia process like this. Can you think of any way to test this more directly? I am almost tempted to go ahead with this without any test as the change itself is really quite simple but that of course makes it hard to ensure this functionality does not regress |
I'm a bit hesitant to merge it directly given how common the failures are 😅 I'll try running the tests locally and see if I can reproduce them or come up with a simpler test. Could just be a bug in the tests rather than in the actual change. |
I think this is a race condition caused by improper behaviour in This happens in CI because the workers take an extra 1-2s to precompile, and that's more than the (feel free to delete/squash my debugging commits BTW, though I would recommend keeping in the calls I added to |
|
||
@testset "PersistentWorkers.jl" begin | ||
cookie = randstring(16) | ||
port = rand(9128:9999) # TODO: make sure port is available? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could open a temporary server with Sockets.listenany()
and get the port from that.
I have a use case where I would like to remove a worker without killing it. Currently, trying to disconnect from the head node will cause the message handler loop to throw a fatal exception, so this adds a check that the connection is still open when trying to read new messages.
The test case is taken from https://github.com/simeonschaub/PersistentWorkers.jl, which I'd eventually like to register once this is merged.