Custom Audio with Specific Format #543

sreedy-riis · 2024-11-15T14:56:45Z

Hello, I am using a library that provides audio data that I would like to send through the livestream. The format of the data is ENCODING_PCM_16BIT, with a sample rate of 16000, with a buffer size of 1280, and a mono channel. I am unable to get this data in any other format.

I have tried implementing a solution with MixerAudioBufferCallback but currently unable to get the data to come through clearly. I have seen in the onBufferRequest for the microphone that the sample rate was 48000 with size of 960. I have also tried resampling the 16000 to 48000 to no avail. Are these two audio sources incompatible for combining? Or is there a different way for me to implement this?

Thanks

The text was updated successfully, but these errors were encountered:

davidliu · 2024-11-15T17:36:26Z

By default, the onBufferRequest is also PCM_16BIT and monochannel, but at 48khz. If you can resample to 48khz, then it should just be a matter of giving chunks of 960 bytes (i.e. 480 audio samples). Since the buffer sizes don't match, it would be a matter of just slicing your buffer into chunks that match.

For your purposes, your buffer size of 1280 bytes at 16khz should get upsampled to a buffer size of 3840 at 48khz, which is simply 4 times the size of 960, so you can simply use ByteBuffer.slice(startIndex, length) to give it a 1/4th of the buffer each time.

It'd look something like this:


val MAX_READS_PER_BUFFER = 4

class AudioCapturer : MixerAudioBufferCallback() {
    var resampledAudioData: ByteBuffer? = null
    var timesRead = 0

    lateinit var outputByteBuffer: ByteBuffer
    override fun onBufferRequest(originalBuffer: ByteBuffer, audioFormat: Int, channelCount: Int, sampleRate: Int, bytesRead: Int, captureTimeNs: Long): BufferResponse? {

        if (!::outputByteBuffer.isInitialized || outputByteBuffer.capacity() != originalBuffer.capacity()) {
            outputByteBuffer = ByteBuffer.allocateDirect(outputByteBuffer.capacity())
        }

        if (resampledAudioData == null) {
            // No audio data, grab more
            val audio = readAudio()
            resampledAudioData = resampleAudioData(sampleRate, audio)
        }

        // Grab the next chunk
        resampledAudioData!!.position(timesRead * originalBuffer.capacity())
        val copyBuffer = resampledAudioData!!.slice()
        copyBuffer.limit(originalBuffer.capacity())

        outputByteBuffer.position(0)
        outputByteBuffer.put(copyBuffer)
        
        timesRead++
        if(timesRead >= MAX_READS_PER_BUFFER) {
            // We're done with this audio data, prepare for next
            timesRead = 0
            resampledAudioData = null
        }

        return BufferResponse(outputByteBuffer)
    }

    fun readAudio(): ByteBuffer {
        TODO()
    }

    fun resampleAudioData(sampleRate: Int, audioData: ByteBuffer): ByteBuffer {
        TODO()
    }
}

For a more robust solution (e.g. to handle cases where your audio data isn't a clean multiple), you could copy it over a separate larger buffer that holds the read audio data, and grabs more audio data to tack onto the end whenever you need more (and using ByteBuffer.compact to move fresh audio data to the front).

sreedy-riis · 2024-11-18T14:10:13Z

Thank you for the response, I will give that a try!

tombang · 2024-11-21T06:49:35Z

Dear @davidliu ，

My microphone only outputs audio at 16kHz, but after uploading it to my cloud ASR model via LiveKit, I found that the audio input to the ASR is at 48kHz. What is the reason for this? I want to maintain the audio input at 16kHz to the ASR model. Is there any way to do this?

davidliu · 2024-11-23T02:16:50Z

The audio delivers at 48khz over the wire, this part can't be changed. You'll need to resample to 16khz on the receiving end.

xiaoxiper · 2024-12-02T04:43:31Z

The audio delivers at 48khz over the wire, this part can't be changed. You'll need to resample to 16khz on the receiving end.

Could you explain this in more detail? I'm facing the same issue. I'd like to know if it's possible to recompile the WebRTC library to use a 16kHz sample rate. I'm using LiveKit agents on the server side and don't need to support background music.

Thank you in advance for your help.
@davidliu

sreedy-riis added the bug Something isn't working label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Audio with Specific Format #543

Custom Audio with Specific Format #543

sreedy-riis commented Nov 15, 2024

davidliu commented Nov 15, 2024

sreedy-riis commented Nov 18, 2024

tombang commented Nov 21, 2024

davidliu commented Nov 23, 2024

xiaoxiper commented Dec 2, 2024 •

edited

Loading

Custom Audio with Specific Format #543

Custom Audio with Specific Format #543

Comments

sreedy-riis commented Nov 15, 2024

davidliu commented Nov 15, 2024

sreedy-riis commented Nov 18, 2024

tombang commented Nov 21, 2024

davidliu commented Nov 23, 2024

xiaoxiper commented Dec 2, 2024 • edited Loading

xiaoxiper commented Dec 2, 2024 •

edited

Loading