Skip to content

Latest commit

 

History

History
121 lines (92 loc) · 3.77 KB

README.zh-CN.md

File metadata and controls

121 lines (92 loc) · 3.77 KB

非官方 VALL-E(Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers)开源 PyTorch 实现。

Buy Me A Coffee

Inference: In-Context Learning via Prompting

see LibriTTS/Inference

Demo

广泛影响

Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.

为避免滥用,良好的训练模型和服务不会被提供。

进展

Buy Me A Coffee

  • Text and Audio Tokenizer
  • Dataset module and loaders
  • VALL-F: seq-to-seq + PrefixLanguageModel
    • AR Decoder
    • NonAR Decoder
  • VALL-E: PrefixLanguageModel
    • AR Decoder
    • NonAR Decoder
  • update README.zh-CN
  • Training
  • Inference: In-Context Learning via Prompting

安装

# PyTorch
pip install torch==1.13.1 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

# phonemizer
apt-get install espeak-ng
## OSX: brew install espeak
pip install phonemizer

# lhotse
# https://github.com/lhotse-speech/lhotse/pull/956
# https://github.com/lhotse-speech/lhotse/pull/960
pip uninstall lhotse
pip uninstall lhotse
pip install git+https://github.com/lhotse-speech/lhotse

# k2 icefall
# pip install k2
git clone https://github.com/k2-fsa/k2.git
cd k2
export K2_MAKE_ARGS="-j12"
export K2_CMAKE_ARGS="-DK2_WITH_CUDA=OFF"
python setup.py install
cd -

git clone https://github.com/k2-fsa/icefall
cd icefall
pip install -r requirements.txt
export PYTHONPATH=`pwd`/../icefall:$PYTHONPATH
echo "export PYTHONPATH=`pwd`/../icefall:\$PYTHONPATH" >> ~/.zshrc
echo "export PYTHONPATH=`pwd`/../icefall:\$PYTHONPATH" >> ~/.bashrc
cd -

# valle
git clone https://github.com/lifeiteng/valle.git
cd valle
pip install -e .

训练

Troubleshooting

Contributing

  • Multi-GPU Training
  • Parallelize bin/tokenizer.py on multi-GPUs
  • Provide GPU resources (MyEmail: [email protected])
  • Buy Me A Coffee

引用

To cite this repository:

@misc{valle,
  author={Feiteng Li},
  title={VALL-E: A neural codec language model},
  year={2023},
  url={http://github.com/lifeiteng/valle}
}
@article{VALL-E,
  title     = {Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers},
  author    = {Chengyi Wang, Sanyuan Chen, Yu Wu,
               Ziqiang Zhang, Long Zhou, Shujie Liu,
               Zhuo Chen, Yanqing Liu, Huaming Wang,
               Jinyu Li, Lei He, Sheng Zhao, Furu Wei},
  year      = {2023},
  eprint    = {2301.02111},
  archivePrefix = {arXiv},
  volume    = {abs/2301.02111},
  url       = {http://arxiv.org/abs/2301.02111},
}