Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WavTokenizer-mdium is release on 2024.09.09 #23

Open
jishengpeng opened this issue Sep 9, 2024 · 4 comments
Open

WavTokenizer-mdium is release on 2024.09.09 #23

jishengpeng opened this issue Sep 9, 2024 · 4 comments
Labels
good question the valuable question news

Comments

@jishengpeng
Copy link
Owner

https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0

@jishengpeng jishengpeng pinned this issue Sep 9, 2024
@jishengpeng jishengpeng added good first issue Good for newcomers invalid This doesn't seem right news and removed good first issue Good for newcomers invalid This doesn't seem right labels Sep 9, 2024
@zsLin177
Copy link

Oh!!!
By the way, what's the difference between speech and music-audio? Does music-audio support speech? Also, how do the models listed at WavTokenizer available models correspond to this?

@jishengpeng
Copy link
Owner Author

Oh!!! By the way, what's the difference between speech and music-audio? Does music-audio support speech? Also, how do the models listed at WavTokenizer available models correspond to this?

We train WavTokenizer-Medium using training data from different domains. For example, the music-audio version is trained solely on AudioSet(~1500 hours) and music data, which precludes support for speech. Conversely, WavTokenizer-Large will leverage a unified model to support speech, music, and audio simultaneously.

@didadida-r
Copy link

!! Thanks for your work, and could you also update the medium result in paper? Because compare to SpeechTokenizer, the out of domain result in small version is not that good

@jishengpeng
Copy link
Owner Author

!! Thanks for your work, and could you also update the medium result in paper? Because compare to SpeechTokenizer, the out of domain result in small version is not that good

In out-of-domain scenarios, the WavTokenizer-Medium-Speech version demonstrates improvements over the WavTokenizer-Small version (LJSpeech), with a 0.6 increase in UTmos, a 0.8 increase in PESQ, and a 0.06 increase in STOI. Furthermore, experiments using WavTokenizer-Medium on various languages have shown promising generalization capabilities, suggesting its potential for effective deployment across diverse linguistic contexts. Let's look forward to WavTokenizer-Large.

@jishengpeng jishengpeng added the good question the valuable question label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good question the valuable question news
Projects
None yet
Development

No branches or pull requests

3 participants