Skip to content

lipsync is a simple and updated Python library for lip synchronization, based on Wav2Lip. It synchronizes lips in videos and images based on provided audio, supports CPU/CUDA, and uses caching for faster processing.

Notifications You must be signed in to change notification settings

mowshon/lipsync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lipsync

lipsync is a Python library that moves lips in a video (or image) to match a given audio file. It is based on Wav2Lip, but many unneeded files and libraries have been removed, and the code has been updated to work with the latest versions of Python.


Features

  • Video lip synchronization
    Synchronize lips in an existing video to match a new audio file.

  • Image lip animation
    Provide a single image and an audio file to generate a talking video.

  • Runs on CPU and CUDA
    You can choose whether to run on your CPU or a CUDA-enabled GPU for faster processing.

  • Caching
    If you use the same video multiple times with different audio files, lipsync can cache frames and reuse them. This makes future runs much faster.


Pre-Trained Weights

lipsync works with two different pre-trained models:

  1. Wav2Lip (Download wav2lip.pth)

    • More accurate lip synchronization
    • Lips in the result may appear somewhat blurred
  2. Wav2Lip + GAN (Download wav2lip_gan.pth)

    • Lips in the result are clearer
    • Synchronization may be slightly less accurate

Installation

pip install lipsync

Usage Example

Below is a simple example in Python. This assumes you have the model weights (either wav2lip.pth or wav2lip_gan.pth) in a weights/ folder.

from lipsync import LipSync

lip = LipSync(
    model='wav2lip',
    checkpoint_path='weights/wav2lip.pth',
    nosmooth=True,
    device='cuda',
    cache_dir='cache',
    img_size=96,
    save_cache=True,
)

lip.sync(
    'source/person.mp4',
    'source/audio.wav',
    'result.mp4',
)

Important Parameters

  • model: Only 'wav2lip' at the moment
  • checkpoint_path: Path to the model weights (e.g., wav2lip.pth, wav2lip_gan.pth)
  • nosmooth: Set True to disable smoothing
  • device: 'cpu' or 'cuda'
  • cache_dir: Directory for saving frames
  • save_cache: Set True to save frames to cache_dir for faster re-runs

Ethical Use

Please be mindful when using lipsync. This library can generate videos that look convincing, so it could be used to spread disinformation or harm someone’s reputation. We encourage using it only for entertainment or scientific purposes, and always with respect and consent from any people involved.

License and Citation

The software can only be used for personal/research/non-commercial purposes. Please cite the following paper if you have use this code:

@inproceedings{10.1145/3394171.3413532,
    author = {Prajwal, K R and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P. and Jawahar, C.V.},
    title = {A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild},
    year = {2020},
    isbn = {9781450379885},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3394171.3413532},
    doi = {10.1145/3394171.3413532},
    booktitle = {Proceedings of the 28th ACM International Conference on Multimedia},
    pages = {484–492},
    numpages = {9},
    keywords = {lip sync, talking face generation, video generation},
    location = {Seattle, WA, USA},
    series = {MM '20}
}

About

lipsync is a simple and updated Python library for lip synchronization, based on Wav2Lip. It synchronizes lips in videos and images based on provided audio, supports CPU/CUDA, and uses caching for faster processing.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages