Conversation Textualization via Lip-reading, supporting multi-party conversations
- Setup the conda environment.
conda create -y -n hearbridge-vsr python=3.8
conda activate hearbridge-vsr
- Install requirements.
python3 -m pip install -r requirements.txt
conda install -c conda-forge ffmpeg
- Download & Extract pre-trained models to:
models/visual/model.pth
models/spm
models/mediapipe/short_range.tflite
models/mediapipe/face_landmarker.tflite
python3 demo.py
- For debugging, add
debug
flag. - For showing camera, add
window
flag.
@inproceedings{ma2023auto,
author={Ma, Pingchuan and Haliassos, Alexandros and Fernandez-Lopez, Adriana and Chen, Honglie and Petridis, Stavros and Pantic, Maja},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10096889}
}