Skip to content

Latest commit

 

History

History
executable file
·
48 lines (35 loc) · 1.08 KB

README.md

File metadata and controls

executable file
·
48 lines (35 loc) · 1.08 KB

HearBridge Visual Speech Recognition

Conversation Textualization via Lip-reading, supporting multi-party conversations

Prerequisites

  1. Setup the conda environment.
conda create -y -n hearbridge-vsr python=3.8
conda activate hearbridge-vsr
  1. Install requirements.
python3 -m pip install -r requirements.txt
conda install -c conda-forge ffmpeg
  1. Download & Extract pre-trained models to:
  • models/visual/model.pth
  • models/spm
  • models/mediapipe/short_range.tflite
  • models/mediapipe/face_landmarker.tflite

Run

python3 demo.py
  • For debugging, add debug flag.
  • For showing camera, add window flag.

Reference

@inproceedings{ma2023auto,
  author={Ma, Pingchuan and Haliassos, Alexandros and Fernandez-Lopez, Adriana and Chen, Honglie and Petridis, Stavros and Pantic, Maja},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels},
  year={2023},
  pages={1-5},
  doi={10.1109/ICASSP49357.2023.10096889}
}