Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Audio with Specific Format #543

Open
sreedy-riis opened this issue Nov 15, 2024 · 5 comments
Open

Custom Audio with Specific Format #543

sreedy-riis opened this issue Nov 15, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@sreedy-riis
Copy link

Hello, I am using a library that provides audio data that I would like to send through the livestream. The format of the data is ENCODING_PCM_16BIT, with a sample rate of 16000, with a buffer size of 1280, and a mono channel. I am unable to get this data in any other format.

I have tried implementing a solution with MixerAudioBufferCallback but currently unable to get the data to come through clearly. I have seen in the onBufferRequest for the microphone that the sample rate was 48000 with size of 960. I have also tried resampling the 16000 to 48000 to no avail. Are these two audio sources incompatible for combining? Or is there a different way for me to implement this?

Thanks

@sreedy-riis sreedy-riis added the bug Something isn't working label Nov 15, 2024
@davidliu
Copy link
Contributor

By default, the onBufferRequest is also PCM_16BIT and monochannel, but at 48khz. If you can resample to 48khz, then it should just be a matter of giving chunks of 960 bytes (i.e. 480 audio samples). Since the buffer sizes don't match, it would be a matter of just slicing your buffer into chunks that match.

For your purposes, your buffer size of 1280 bytes at 16khz should get upsampled to a buffer size of 3840 at 48khz, which is simply 4 times the size of 960, so you can simply use ByteBuffer.slice(startIndex, length) to give it a 1/4th of the buffer each time.

It'd look something like this:


val MAX_READS_PER_BUFFER = 4

class AudioCapturer : MixerAudioBufferCallback() {
    var resampledAudioData: ByteBuffer? = null
    var timesRead = 0

    lateinit var outputByteBuffer: ByteBuffer
    override fun onBufferRequest(originalBuffer: ByteBuffer, audioFormat: Int, channelCount: Int, sampleRate: Int, bytesRead: Int, captureTimeNs: Long): BufferResponse? {

        if (!::outputByteBuffer.isInitialized || outputByteBuffer.capacity() != originalBuffer.capacity()) {
            outputByteBuffer = ByteBuffer.allocateDirect(outputByteBuffer.capacity())
        }

        if (resampledAudioData == null) {
            // No audio data, grab more
            val audio = readAudio()
            resampledAudioData = resampleAudioData(sampleRate, audio)
        }

        // Grab the next chunk
        resampledAudioData!!.position(timesRead * originalBuffer.capacity())
        val copyBuffer = resampledAudioData!!.slice()
        copyBuffer.limit(originalBuffer.capacity())

        outputByteBuffer.position(0)
        outputByteBuffer.put(copyBuffer)
        
        timesRead++
        if(timesRead >= MAX_READS_PER_BUFFER) {
            // We're done with this audio data, prepare for next
            timesRead = 0
            resampledAudioData = null
        }

        return BufferResponse(outputByteBuffer)
    }

    fun readAudio(): ByteBuffer {
        TODO()
    }

    fun resampleAudioData(sampleRate: Int, audioData: ByteBuffer): ByteBuffer {
        TODO()
    }
}

For a more robust solution (e.g. to handle cases where your audio data isn't a clean multiple), you could copy it over a separate larger buffer that holds the read audio data, and grabs more audio data to tack onto the end whenever you need more (and using ByteBuffer.compact to move fresh audio data to the front).

@sreedy-riis
Copy link
Author

Thank you for the response, I will give that a try!

@tombang
Copy link

tombang commented Nov 21, 2024

Dear @davidliu

My microphone only outputs audio at 16kHz, but after uploading it to my cloud ASR model via LiveKit, I found that the audio input to the ASR is at 48kHz. What is the reason for this? I want to maintain the audio input at 16kHz to the ASR model. Is there any way to do this?

@davidliu
Copy link
Contributor

The audio delivers at 48khz over the wire, this part can't be changed. You'll need to resample to 16khz on the receiving end.

@xiaoxiper
Copy link

xiaoxiper commented Dec 2, 2024

The audio delivers at 48khz over the wire, this part can't be changed. You'll need to resample to 16khz on the receiving end.

Could you explain this in more detail? I'm facing the same issue. I'd like to know if it's possible to recompile the WebRTC library to use a 16kHz sample rate. I'm using LiveKit agents on the server side and don't need to support background music.

Thank you in advance for your help.
@davidliu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants