Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in Pa_StopStream() #15

Open
darksylinc opened this issue Mar 9, 2024 · 1 comment
Open

Deadlock in Pa_StopStream() #15

darksylinc opened this issue Mar 9, 2024 · 1 comment

Comments

@darksylinc
Copy link
Contributor

darksylinc commented Mar 9, 2024

Describe the bug

I got deadlocked here:

syscall 0x00000077b6eb9f90
pthread_join 0x00000077b6f23444
PaUnixThread_Terminate pa_unix_util.c:441
StopStream pa_opensles.c:1231
Pa_StopStream pa_front.c:1469
AudioMixerPortAudio::pause() AudioMixer.cpp:119

Portaudio's thread is stuck here:

syscall 0x00000077b6eb9f90
__futex_wait_ex(volatile void *, bool, int, bool, const timespec *) 0x00000077b6ebe860
sem_wait 0x00000077b6ecc038
StreamProcessingCallback pa_opensles.c:988
__pthread_start(void *) 0x00000077b6f22b18
__start_thread 0x00000077b6ebf360

Android's internal audio threads are stuck here (i.e. the one responsible for calling NotifyBufferFreeCallback and unstuck everything):

syscall 0x00000077b6eb9f90
android::ClientProxy::obtainBuffer(android::Proxy::Buffer *, const timespec *, timespec *) 0x00000077be369d78
android::AudioTrack::obtainBuffer(android::AudioTrack::Buffer *, const timespec *, timespec *, unsigned long *) 0x00000077be35ee30
android::AudioTrack::processAudioBuffer() 0x00000077be35e37c
android::AudioTrack::AudioTrackThread::threadLoop() 0x00000077be35d744
android::Thread::_threadLoop(void *) 0x00000077acbe2658
android::AndroidRuntime::javaThreadShell(void *) 0x00000077af0a2a3c
thread_data_t::trampoline(const thread_data_t *) 0x00000077acbe1dec
__pthread_start(void *) 0x00000077b6f22b18
__start_thread 0x00000077b6ebf360

To Reproduce

It is unclear the exact conditions but I got a repeatable way to repro on my phone (Android 12 MIUI Global 14.0.1 Xiaomi POCO F2 Pro):

Initialization code:

const double c_sampleRate = 48000.0;
static int patestCallback( const void *inputBuffer, void *outputBuffer, unsigned long framesPerBuffer,
						   const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags,
						   void *userData )
{
	return paContinue;
}

	// Open an audio I/O stream.
	const PaError errorCode = Pa_OpenDefaultStream( &mStream, 0,   // no input channels
													2,             // stereo output
													paInt16,       //
													c_sampleRate,  //
													0,             // frames per buffer
													patestCallback, this );
 Pa_StartStream( mStream );
  1. Run the app with Android Studio's debugger attached.
  2. Start playback with initialization code provided ago.
  3. Put a breakpoint somwhere inside StreamProcessingCallback.
  4. Wait a few seconds (vary the frequency in which you do this).
  5. Optionally, disable the breakpoint (follow this step randomly)
  6. Continue
  7. Repeat step 3 until audio disappears. When this happens sem_wait will no longer wake up.
  8. Calling Pa_StopStream() afterwards will deadlock because it wants the worker thread to end, but the sem_wait will never wake up.

I suspect running out of data is what's causing this (could possibly be an Android OS bug) as when doing steps 3 - 6 I get the following:

2024-03-08 22:24:25.843 28227-28569 AudioTrack              com.artofthestate.airwar             W  restartIfDisabled(105): releaseBuffer() track 0xb4000076a9f24000 disabled due to previous underrun, restarting

Googling online I can't find much but I did find this StackOverflow question:

(...) For some reason AudioTrack doesn't seem to like sitting around with an empty buffer. (...)

Workaround

I was able to "fix" the problem with the following snippet:

struct timespec abstime;
// Calculate the time needed to play all buffers, then add 500ms for spare room.
const int64_t millisecondsNeeded =
	numberOfBuffers * ( 10000 * (int64_t)stream->framesPerHostCallback ) /
		( (int64_t)stream->streamRepresentation.streamInfo.sampleRate ) +
	500;

abstime.tv_nsec = timeSpec.tv_nsec + millisecondsNeeded * 1000000;
const int64_t remainder = abstime.tv_nsec % 1000000000;
abstime.tv_sec = timeSpec.tv_sec + ( abstime.tv_nsec - remainder ) / 1000000000;
abstime.tv_nsec = remainder;

int semaphoreResult = -1;
do
{
	semaphoreResult = sem_timedwait( &stream->outputSem, &abstime );
} while( semaphoreResult == -1 && errno == EINTR );  // Restart if interrupted by handler

if( semaphoreResult == -1 && errno == ETIMEDOUT)
{
	// Android may get into a deadlock if underruns occur in which stream->outputSem
	// will never wake up. Thus we use a timeout to restart the stream.
	// See https://github.com/croissanne/portaudio_opensles/issues/15
	// SLuint32 currState = 7777;
	//(*stream->playerItf)->GetPlayState( stream->playerItf, &currState );
	( *stream->playerItf )->SetPlayState( stream->playerItf, SL_PLAYSTATE_STOPPED );
	( *stream->outputBufferQueueItf )->Clear( stream->outputBufferQueueItf );
	( *stream->playerItf )->SetPlayState( stream->playerItf, SL_PLAYSTATE_PLAYING );
}

In case you're wondering, uncommenting "currState" (and commenting out my workaround) returns that the state is always SL_PLAYSTATE_PLAYING.

My workaround has the following issue though:

Once the workaround hits at least once, the app will start spamming the following on logcat (level = debug):

PlayerBase::stop() from IPlayer
2024-03-08 22:40:29.067 29971-30200 AudioTrack              com.artofthestate.airwar             D  stop(sessionID=2057)
2024-03-08 22:40:29.067 29971-30200 AudioTrack              com.artofthestate.airwar             D  stop(106): prior state:STATE_ACTIVE
2024-03-08 22:40:29.067 29971-30200 AudioTrack              com.artofthestate.airwar             D  stop(106): called with 376512 frames delivered
2024-03-08 22:40:29.072 29971-30190 thestate.airwa          com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:40:29.072 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(sessionID=2057)
2024-03-08 22:40:29.091 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): prior state:STATE_ACTIVE
2024-03-08 22:40:29.091 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): called with 0 frames delivered
2024-03-08 22:40:30.141 29971-30193 BpBinder                com.artofthestate.airwar             W  Slow Binder: BpBinder transact took 1046 ms, interface=android.media.IAudioPolicyService, code=45 oneway=false
2024-03-08 22:40:30.143 29971-30190 thestate.airwa          com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:40:30.143 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(sessionID=2057)
2024-03-08 22:40:30.143 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): prior state:STATE_ACTIVE
2024-03-08 22:40:30.143 29971-30190 AudioTrack              com.artofthestate.airwar             D  stop(106): called with 768 frames delivered

But here's the fun part:

I can repro this behavior on Xiaomi POCO F2 Pro (stock ROM) and on a Samsung A54 5G (stock ROM Android 14).

But one of my Lenovo tablets is the only device to perform this log spam naturally (without any workarounds). And unsurprisingly, I can't repro the bug there! I already caught Lenovo hacking the GPU driver to workaround a Vulkan driver bug, so I wouldn't be surprised they're doing something similar to my workaround to fix an Android bug.

In a Redmi Note 4X (custom ROM Android 10, Havoc OS) I also can't reproduce, BUT when I force a buffer underrun I get the following:

2024-03-08 22:50:36.049  9039-9926  AudioTrack              com.artofthestate.airwar             W  restartIfDisabled(21): releaseBuffer() track 0x7481520000 disabled due to previous underrun, restarting
2024-03-08 22:50:36.052  9039-9926  <no-tag>                com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:50:36.052  9039-9926  AudioTrack              com.artofthestate.airwar             D  stop(21): called with 818688 frames delivered
2024-03-08 22:50:36.056  9039-9926  <no-tag>                com.artofthestate.airwar             D  PlayerBase::stop() from IPlayer
2024-03-08 22:50:36.056  9039-9926  AudioTrack              com.artofthestate.airwar             D  stop(21): called with 768 frames delivered

There is no log spam; those are all the messages that appear after the underrun (until I force another underrun).

My workaround appears to work & fix the issue, but I would love some feedback.

Expected behavior

Pa_StopStream should not deadlock.
Audio should not (randomly) disappear if a buffer underrun occurs.

Actual behavior

Deadlock.
Audio randomly disappears if a buffer underrun occurs (it doesn't happen on all buffer underruns).

How frequent is this?

I am able to repro this very easily with the debugger attached; but on a normal run it is hard to trigger on fast phones (I think I got this bug twice in 1 year; but the debugger was not attached so I can't be certain).

But we're experiencing high ANRs (3%), particularly from slow phones (our game is very demanding, specially on loading screens) and I suspect this may be (in part?) the cause of them.

Update: Yes, this is the culprit of some of those ANRs. Inspecting all threads, they're stuck in:

android::ClientProxy::obtainBuffer (Android's internal audio thread) and sem_wait (pa_opensles.c:988).

Desktop (please complete the following information):

  • OS: Android
  • OS Version 12 and 14
@darksylinc
Copy link
Contributor Author

I've updated the workaround code in the ticket; since it now accounts for some edge cases and properly estimates a reasonable amount of timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant