Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash on double pthread_cancel(3) on exit #1800

Merged

Conversation

klemensn
Copy link

Process teardown stops all threads, thus calling pthread_cancel(3) in an
atexit(3) handler immediately before main() exits seems redundant.

On OpenBSD 7.4-current, failure to listen on the RTSP socket triggers

$ nc -4l 5000 &
$ nc -6l 5000 &
$ shairport-sync -c/dev/null
warning: could not establish a service on port 5000 -- program terminating. Is another instance of Shairport Sync running on this device?
Segmentation fault (core dumped)
Program terminated with signal SIGSEGV, Segmentation fault.
#0  pthread_cancel (thread=0x6207604e640) at /usr/src/lib/librthread/rthread.c:433
433             if (tib->tib_canceled == 0 && tid != 0 &&
[Current thread is 1 (process 290061)]
#0  pthread_cancel (thread=0x6207604e640) at /usr/src/lib/librthread/rthread.c:433
#1  0x0000061e5377df14 in exit_rtsp_listener ()
#2  0x000006212fdb4140 in _libc___cxa_finalize (dso=0x0) at /usr/src/lib/libc/stdlib/atexit.c:177
#3  0x000006212fde9c45 in _libc_exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:54
#4  0x0000061e5377ba88 in _start ()

Fix this by omitting the explicit pthread_cancel(3), at which point
exit_rtsp_listener() becomes redundant, so remove it entirely.

@klemensn
Copy link
Author

klemensn commented Jan 31, 2024

After this, rtsp_listen_loop() might as well run in the main thread instead which does noting but start and wait for it,
thus getting rid of rtsp_listener_thread entirely.

   pthread_create(&rtsp_listener_thread, NULL, &rtsp_listen_loop, NULL);
   pthread_join(rtsp_listener_thread, NULL);
   return 0;
}

I refrained from this intrusive change, given that it seems to break from the overall coding style/thread model.
Thoughts?

@mikebrady
Copy link
Owner

Great work -- let me check this out, please.

After this, rtsp_listen_loop() might as well run in the main thread instead which does nothing but start and wait for it,
thus getting rid of rtsp_listener_thread entirely.

This is a throwback to an early idea that Shairport Sync, once initialised, might do more than just listen on a single RTSP thread. For instance, it might listen on two RTSP threads, or might so something completely different as well. As it stands, it doesn't do anything, but it's probably no harm to leave it as it is for the present. Whaddyathink?

@klemensn
Copy link
Author

This is a throwback to an early idea that Shairport Sync, once initialised, might do more than just listen on a single RTSP thread. For instance, it might listen on two RTSP threads, or might so something completely different as well. As it stands, it doesn't do anything, but it's probably no harm to leave it as it is for the present. Whaddyathink?

No harm being consistent with how all the other tasks are handled in their own thread.
Merging the RTSP thread into the main one is a tiny code and overhead reduction, but would obviously cross any such plans or ideas you just layed out.

It certainly works now, other than cosmetics I'd have no reason to touch it now.

@mikebrady
Copy link
Owner

Great work -- let me check this out, please.

Thanks again for looking at this, but I would not quite agree with your analysis.

The reason for the crash is that the rtsp_listener_thread exits immediately if the port is already in use, and so the subsequent pthread_cancel in the atexit routine exit_rtsp_listener is being called on an already-disappeared thread.

However, I think it's important to keep the atexit behaviour to ensure the rtsp_listener_thread is cancelled before other supporting services are removed.

This means that the rtsp_listener_thread should be prevented from ever exiting "normally", as it does now if the port is not available. This can be achieved by changing the line:

    warn("could not establish a service on port %d -- program terminating. Is another instance of "
         "Shairport Sync running on this device?",
         config.port);

to

    die("could not establish a service on port %d -- program terminating. Is another instance of "
         "Shairport Sync running on this device?",
         config.port);

causing an immediate abnormal exit.

What would you think?

@klemensn
Copy link
Author

klemensn commented Feb 1, 2024

Great work -- let me check this out, please.

Thanks again for looking at this, but I would not quite agree with your analysis.

The reason for the crash is that the rtsp_listener_thread exits immediately if the port is already in use, and so the subsequent pthread_cancel in the atexit routine exit_rtsp_listener is being called on an already-disappeared thread.

However, I think it's important to keep the atexit behaviour to ensure the rtsp_listener_thread is cancelled before other supporting services are removed.

What difference does it make (in this case)?

RTSP pthread_cancel() -> main pthread_join() -> main return -> process gone
vs.
RTSP die() -> pthread_setcancelstate(), exit() -> process gone

This means that the rtsp_listener_thread should be prevented from ever exiting "normally", as it does now if the port is not available. This can be achieved by changing the line:

    warn("could not establish a service on port %d -- program terminating. Is another instance of "
         "Shairport Sync running on this device?",
         config.port);

to

    die("could not establish a service on port %d -- program terminating. Is another instance of "
         "Shairport Sync running on this device?",
         config.port);

causing an immediate abnormal exit.

What would you think?

Works as well, but also aptly returns EXIT_FAILURE via die() rather than 0 in main().

Let's do that, thanks.

On OpenBSD 7.4-current, failure to listen on the RTSP socket(s) results
the `rtsp_listener_thread` being pthread_cancel(3)'ed twice, once through
`rtsp_listen_loop()` and again via atexit(3) handler `exit_rtsp_listener()`:

```
$ nc -4l 5000 &
$ nc -6l 5000 &
$ shairport-sync -c/dev/null
warning: could not establish a service on port 5000 -- program terminating. Is another instance of Shairport Sync running on this device?
Segmentation fault (core dumped)
```
```
Program terminated with signal SIGSEGV, Segmentation fault.
433             if (tib->tib_canceled == 0 && tid != 0 &&
[Current thread is 1 (process 290061)]
```

`die()` -> `exit(EXIT_FAILURE)` normally in this case, thus forgoing the
first cancel and relying on the atexit handler alone.

With Mike Brady.
@klemensn klemensn force-pushed the rtsp-listener-exit-segfault branch from 79e6ee0 to 783d600 Compare February 1, 2024 17:39
@mikebrady mikebrady merged commit 00a4758 into mikebrady:development Feb 6, 2024
9 checks passed
@mikebrady
Copy link
Owner

Many thanks again for this.

@klemensn klemensn deleted the rtsp-listener-exit-segfault branch February 7, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants