-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WavTokenizer-mdium is release on 2024.09.09 #23
Comments
Oh!!! |
We train WavTokenizer-Medium using training data from different domains. For example, the music-audio version is trained solely on AudioSet(~1500 hours) and music data, which precludes support for speech. Conversely, WavTokenizer-Large will leverage a unified model to support speech, music, and audio simultaneously. |
!! Thanks for your work, and could you also update the medium result in paper? Because compare to SpeechTokenizer, the |
In out-of-domain scenarios, the WavTokenizer-Medium-Speech version demonstrates improvements over the WavTokenizer-Small version (LJSpeech), with a 0.6 increase in UTmos, a 0.8 increase in PESQ, and a 0.06 increase in STOI. Furthermore, experiments using WavTokenizer-Medium on various languages have shown promising generalization capabilities, suggesting its potential for effective deployment across diverse linguistic contexts. Let's look forward to WavTokenizer-Large. |
https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0
The text was updated successfully, but these errors were encountered: