WavTokenizer-mdium is release on 2024.09.09 #23

jishengpeng · 2024-09-09T10:17:15Z

https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0

zsLin177 · 2024-09-10T05:33:51Z

Oh！！！
By the way, what's the difference between speech and music-audio? Does music-audio support speech? Also, how do the models listed at WavTokenizer available models correspond to this?

jishengpeng · 2024-09-10T06:14:42Z

Oh！！！ By the way, what's the difference between speech and music-audio? Does music-audio support speech? Also, how do the models listed at WavTokenizer available models correspond to this?

We train WavTokenizer-Medium using training data from different domains. For example, the music-audio version is trained solely on AudioSet(~1500 hours) and music data, which precludes support for speech. Conversely, WavTokenizer-Large will leverage a unified model to support speech, music, and audio simultaneously.

didadida-r · 2024-09-10T06:49:05Z

!! Thanks for your work, and could you also update the medium result in paper? Because compare to SpeechTokenizer, the out of domain result in small version is not that good

jishengpeng · 2024-09-10T09:37:04Z

!! Thanks for your work, and could you also update the medium result in paper? Because compare to SpeechTokenizer, the out of domain result in small version is not that good

In out-of-domain scenarios, the WavTokenizer-Medium-Speech version demonstrates improvements over the WavTokenizer-Small version (LJSpeech), with a 0.6 increase in UTmos, a 0.8 increase in PESQ, and a 0.06 increase in STOI. Furthermore, experiments using WavTokenizer-Medium on various languages have shown promising generalization capabilities, suggesting its potential for effective deployment across diverse linguistic contexts. Let's look forward to WavTokenizer-Large.

jishengpeng pinned this issue Sep 9, 2024

jishengpeng added good first issue Good for newcomers invalid This doesn't seem right news and removed good first issue Good for newcomers invalid This doesn't seem right labels Sep 9, 2024

jishengpeng added the good question the valuable question label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WavTokenizer-mdium is release on 2024.09.09 #23

WavTokenizer-mdium is release on 2024.09.09 #23

jishengpeng commented Sep 9, 2024

zsLin177 commented Sep 10, 2024

jishengpeng commented Sep 10, 2024

didadida-r commented Sep 10, 2024

jishengpeng commented Sep 10, 2024

WavTokenizer-mdium is release on 2024.09.09 #23

WavTokenizer-mdium is release on 2024.09.09 #23

Comments

jishengpeng commented Sep 9, 2024

zsLin177 commented Sep 10, 2024

jishengpeng commented Sep 10, 2024

didadida-r commented Sep 10, 2024

jishengpeng commented Sep 10, 2024