Audio output format, bits per sample less relevant than sample rate for quality #561

CircuitCM · 2023-08-09T13:12:10Z

CircuitCM
Aug 9, 2023

Hi,

I noticed that the bit depth of the audio output is 32. From my experience the difference between the quality of a 32 bit sample and a 16 bit sample is very small, it has much less impact than the difference between a 44 HZ and 24 HZ sample frequency. But if it was possible to generate output with 16 bit depth that should increase the output rate by ~2x right? I suspect this can't be done because of how the model is built and trained?

neonbjb · 2023-08-09T14:46:07Z

neonbjb
Aug 9, 2023
Maintainer

You can perform inference in fp16 for these models and it does speed things up on server class hardware. On consumer hardware there is no change. The model does not actually output 32 bit audio since it never recorded anything it's just mimicking the dataset and it's actual output quality is probably worse than both 16bit real recordings.

1 reply

CircuitCM Aug 9, 2023
Author

I see thanks. Then I'm guessing that the model's output is a continuous approximation of the waveform at each frame, and its training set is 24 HZ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio output format, bits per sample less relevant than sample rate for quality #561

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Audio output format, bits per sample less relevant than sample rate for quality #561

CircuitCM Aug 9, 2023

Replies: 1 comment · 1 reply

neonbjb Aug 9, 2023 Maintainer

CircuitCM Aug 9, 2023 Author

CircuitCM
Aug 9, 2023

Replies: 1 comment 1 reply

neonbjb
Aug 9, 2023
Maintainer

CircuitCM Aug 9, 2023
Author