Replies: 1 comment 1 reply
-
You can perform inference in fp16 for these models and it does speed things up on server class hardware. On consumer hardware there is no change. The model does not actually output 32 bit audio since it never recorded anything it's just mimicking the dataset and it's actual output quality is probably worse than both 16bit real recordings. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I noticed that the bit depth of the audio output is 32. From my experience the difference between the quality of a 32 bit sample and a 16 bit sample is very small, it has much less impact than the difference between a 44 HZ and 24 HZ sample frequency. But if it was possible to generate output with 16 bit depth that should increase the output rate by ~2x right? I suspect this can't be done because of how the model is built and trained?
Beta Was this translation helpful? Give feedback.
All reactions