Skip to content

Releases: anothermartz/Easy-Wav2Lip

v8.3 - GUI for local install + Mac support

21 Apr 10:33
a98e30b
Compare
Choose a tag to compare

GUI

I made a simple GUI for local installs:
Easy-Wav2Lip GUI v8 3
You can select files using the 3 dots to the right of the input boxes.

This is way better than modifying the config.ini each time!

It also includes a preview window that allows you to see each frame as it processes. If it looks wrong, you can press Q to abort so you can change the settings.

MacOS support

I also added some lines that will use "mps" if available instead of cuda, this should make it run with decent speed on Macs that have ARM processors. I also enabled cpu inference if you have neither and like torture and waiting 🙂.

Easy-Wav2Lip.bat

There are only minor tweaks since the last release but I've put it in a new place so that I can always point to the latest version with the same link and also all the code can be seen before downloading:

https://github.com/anothermartz/Easy-Wav2Lip/blob/Installers/Easy-Wav2Lip.bat
(direct download link)

Colab

https://colab.research.google.com/github/anothermartz/Easy-Wav2Lip/blob/v8.3/Easy_Wav2Lip_v8.3.ipynb

Open In Colab

The colab remains practically the same, I removed the obsolete "Experimental" option as all it did was take longer without improving quality and I also added a check to make sure a file is selected and not a directory (issue #50).

Shelved

I've decided that I'm not going to do more with Wav2Lip for now. With a lot of work I may be able to improve performance and/or quality but I think by the time I could achieve that, there'll be an alternative to Wav2Lip that will massively outperform whatever I can do.

But if that alternative tool is finicky and difficult to get to grips with, I'll have a go at making it easier to use :)

In the meantime I'm going to work on a new project where I intend to drive the facial animations of a face in one video using the face of another. The main intention would be to add facial animations to an otherwise blank character in a video game, but it could be used to make a movie character say something funny too.

I've had this idea for a long time and an early test of it has been the origin of my "scary" avatar.

Easy-Wav2Lip.bat venv update

18 Mar 19:42
7e27d82
Compare
Choose a tag to compare

My baby's got the venvs! description

I managed to get around what was causing issues when using a venv (python virtual environment) to set up Easy-Wav2Lip the first time I tried.

Advantages of using a venv:

  • Doesn't mess with your existing python install that may want to use other versions of the same modules used here
  • Allows installations of multiple python versions for when other projects require them
  • Allows easy complete un/reinstallation of Easy-Wav2Lip for if things go wrong (just delete the Easy-Wav2Lip-venv and Easy-Wav2Lip folders and it's like it never happened and you didn't just spend 3 hours trying to make a video of Ben Shapiro performing rapper's delight).

Disadvantage:

  • A bigger install size if other projects use the same modules as this (unlikely)

Also I got rid of VS BuildTools C++ module installation!

It was the biggest pain for installation as it wasn't completely obvious to tick the C++ box and it took ages to install and it turns out it was completely unnecessary if pip was updated, which I already took care of so whoops!

If you don't need it for anything else, uninstall VS BuildTools to save yourself some storage space.

Python 3.10.11 and, Git now install automatically so the whole operation is completely automated!

From a fresh install, even without any prerequesits, you launch Easy-Wav2Lip.bat, confirm you want it to install in this location and it'll do it all entirely on its own! (takes a while and downloads a bunch!)

Also, commandline arguments

Unrelated to Easy-Wav2lip.bat but I recently updated run.py to allow for the following commandline arguments:
-video_file
-vocal_file
-output_file

usage (all commands are optional):
python run.py -video_file "filepath" -vocal_file "filepath" -output_file "filepath"

Things not specified here will still be taken from config.ini

If you want any more commandlines I could do that too but I'm not expecting this usage to be very widespread.

v8.2 (fixed issue)

17 Mar 00:03
c90f4ec
Compare
Choose a tag to compare

I fixed issue #28 which broke the colab version completely and may have prevented new local installs from working. So now things should work as they did before 🎉

Open In Colab
https://colab.research.google.com/github/anothermartz/Easy-Wav2Lip/blob/v8.2/Easy_Wav2Lip_v8.2.ipynb

For Easy-Wav2Lip.bat users:
It should prompt you to update to 8.2 (I've included Easy-Wav2Lip.bat here for new users but there are no changes to the last version).

For manual install users just pull from the v8.2 branch and run install.py

How did it break?

A module called torchvision updated and with it changed how some of its code is called. Another module called basicsr was still using the old way and as a result, broke when gfpgan tried to install. This kills the step 1. 🦀

To fix this I added the offending file from basicsr into my project and fixed the outdated line. First I install basicsr, then I replace the old file with my fixed file. Now when gfpgan installs, it doesn't get upset and burn the the entire project install :)

200 Stars!?

Also I've realised that this project is slowly gaining traction when I thought it wasn't really being used so I will try to work on a demonstration video, update the readme with pictures and work on improving the usability and result quality.

v8.1 colab quick fix!

15 Mar 22:34
Compare
Choose a tag to compare

So something deeply embedded updated and broke something else deeply embedded. It may take me a while to figure this out so in the mean time I've employed a quick and dirty fix just to make it actually continue working by adding a measly one line of code.

If you want to apply this to your own customised Easy-Wav2Lip colab, edit the code of Step 1 and just make a new line under #install prerequisites (line 45) and enter !pip install -r requirements.txt

Screenshot 2024-03-15 223836

Ironically while the fix itself is quick, it makes step 1 incredibly slow as I'm forcing specific versions of a bunch of stuff which it now has to download and install.

I'll investigate this further to try a more elegant solution!

https://colab.research.google.com/github/anothermartz/Easy-Wav2Lip/blob/v8.1/Easy_Wav2Lip_v8.1_quickfix.ipynb

Just ignore the request to restart the runtime, it'll go on just fine without restarting.

v8.1

30 Oct 10:04
2744f6a
Compare
Choose a tag to compare

I decided to do a v8.1 release so that the small fixes I've done since releasing v8 are pushed to those using the Easy-Wav2Lip.bat

I've fixed some issues with the saving and deleting of the last detected face tracking data and added a new variable/tick box to force the deletion of it, useful if you're using 2 different files with the same name.

Open In Colab

There is also a new Easy-Wav2Lip_v8.1.bat found here, so please use that especially if you're new to this tool as it has extra checks to make sure it installs properly. There are now more checks to make sure VSBuildTools is the 2022 version, that the system has an Nvidia card and that new enough drivers are installed which should ensure that Cuda 12 is installed.

v8

22 Oct 16:02
537b9af
Compare
Choose a tag to compare
v8

Running locally is here!

I managed to convert all the code to be able to run locally!

Furthermore, I also have made a super easy to use Easy-Wav2Lip.bat file that not only installs, runs and can update Easy-Wav2Lip, but it even checks for and automatically downloads Python, Git. ffmpeg and VS BuildTools if required! You can basically run this on a fresh install of Windows and have it working with minimal input.

*Easy-Wav2Lip.bat will only work on Windows 10 (64-bit) and Windows 11 on a non-ARM CPU and Nvidia GPU with the latest drivers.

  • MacOS and Linux should be able to run Easy-Wav2Lip too but you will need to more manually install by following the instructions here to install.
  • An AMD processor should work fine.
  • An AMD GPU probably won't, you will need to find a way to get CUDA working on it (12.2 ideally).
  • It may work on other versions of Python. but if you have any issues and are using a different version, that's likely why!

Google Colab

If it is not compatible with your setup, there's always the colab!

https://colab.research.google.com/github/anothermartz/Easy-Wav2Lip/blob/v8/Easy_Wav2Lip_v8.ipynb

Open In Colab

The v8 colab will be functionally the same as v7, it has just been adapted to use the new locally-oriented code and I will be updating it alongside the local version moving forward.

What’s next?

I have an idea to improve the fundamental quality of Wav2Lip, not the resolution, but I'm interested in solving how it struggles to lipsync over existing mouth movments, especially Wav2Lip_GAN.

This will be a big undertaking with lots of trial and error and I think I will document these experiments in video form as I imagine a lot of funny looking errors will occur!

Feel free to leave me some feedback or suggestions here on GitHub or Discord! 💖

v7

17 Oct 00:06
Compare
Choose a tag to compare
v7

v7 is here!

Open In Colab

What’s new:

[Faster] Processing runs a bit faster! 📈
[Faster] Particularly the ‘Improved’ quality option which has a knock-on effect for 'Enhanced'. ⏩
[New] ‘Experimental’ quality option that only upscalse the mouth area. 🥼 Except it doesn’t work very well. 👎
[Changed] Ported code over to this repository instead of relying on another repository. 📦
[Removed] Removed redundant code and folders. 🗑️

Speed ups:

I figured out an optimization for my mask creating function - Why was I tracking where the mouth was on every frame when it was already looking at a cropped image of a face? The mouth is basically going to be in the same place on each frame within that crop! So now I only detect the mouth and create the mask on the first frame, from then on it just uses the same mask, saving time! 🚀

So much time actually, that it's almost the same speed as "Fast" - I'll likely just drop that in the next verson!

If you find a clip where the mask isn’t following the subject’s mouth properly, you can revert this optimization by ticking the “mouth_tracking” box.

I also improved the overall processing speed by writing directly to .mp4 instead of .avi first. Strangely, I remember changing it from .mp4 to .avi and noticed a speed increase, but now checking again I see that it’s faster to write to .mp4, which also is more intuitive, so I don’t know what happened before. Well done me for undoing what I previously made worse. 🥇

Experimental quality:

This version intended to introduce a new way of upscaling by only applying gfpgan to the mouth area instead of the whole face, saving time. (suggested here: #8)

However, I discovered that some frames were not being upscaled due to gfpgan not recognizing the cropped image as a mouth. In order to rectify this, I had to increase the size of the crop to include more of the face, to the point where the difference between the larger crop and the full face was negligible. At this size, the increase in gfpgan processing speed was offset so much by the time it took to detect and crop the mouth that it resulted in an overall slower processing time than when I just upscaled the whole face! 😔

I also guessed that processing gfpgan outside the wav2lip bounding box would smooth out the harsh lines typically found on the chin, but unfortunately, that too was a false prediction. 😞

Still, I have left this failed method in as the “Experimental” quality - feel free to try it but personally I think it’s a bust! 💥

Other:

I finally merged the code into this repository and removed a bunch of code and folders that didn’t need to be there, as well as imports.

Theoretically this will increase setup and inference time, but practically speaking it actually doesn’t make a noticeable difference. Still, it’s better than having random unused code lying around.🧹

I also made the video player scale to your video size up to 1280 pixels wide. 🎞️

What’s next?

I intend to make this code possible to run locally, supposedly someone on the discord figured this out already, but I’d like to make it easy. It is after all, Easy-Wav2Lip!

Feel free to leave me some feedback or suggestions here on GitHub or Discord! 💖

v6.5 (hotfixes)

07 Oct 11:11
Compare
Choose a tag to compare

(now defunct)

I noticed a couple of bugs when using v6 so I fixed them here, hopefully you didn't experience them in the first place:

  • Fixed "resizing video" repeating in the log for every frame
  • Fixed padding size not updating to new settings when you process the same video as last time
  • I think I fixed preview_video not working for high fps videos? This whole feature needs a rework anyway.

I also improved the masking feature to scale based on the actual size of the mouth, instead of it being increased by a set number of pixels.

This should make your settings a lot more consistent between different clips, different resolutions and if the subject moves closer to or futher or way from the camera.

I'm now working on separating the wav2lip stage and the gfpgan stage in order to satisfy the suggestion of using gfpgan only on the area of the mask #8 - in addition to improving inference speed, I'm hoping this will also greatly reduce the straight lines at the edges of the face caused by wav2lip but we'll see!

v6

30 Sep 20:30
2e73bff
Compare
Choose a tag to compare
v6

v6 colab:
(now defunct)

Whoops! forgot to put v5 as a release! I shared it with anyone that would've used it anyway. At least I hope I did.

v5 notes:

Waaaay way faster inference time, and quality improvements!

I utelised the optimisations and improvements of wav2lip from this project: https://github.com/devxpy/cog-Wav2Lip

In my short test clip, processing time improved by about 83% compared to v4!
The new tracking method also fixed many visual bugs too!

EZ wav2lip v4 to v5 comparison

I removed other upscaling methods than gfpgan because I couldn't find a use case where those were better.
If you disagree, let me know and I'll add them back in.

v6 notes:

Changed the masking from being the whole face to a feathered mask around the mouth, it should look a lot more natural and won't have the fake looking eyes when upscaling from v5.

You can see what the mask is actually covering by using the debug_mask checkbox.

mask example
(left: v5, middle: v6 mask_debug, right: v6

Added preview_settings to process only a single frame for checking settings so you don't have to render the entire video to check settings.

Reduced processing time when you process the same video multiple times by saving the face detection process.

Significantly reduced the overall processing time by pre-loading certain things in step 1.

These two factors make it much better to tweak the settings on the same clip, especially if you use preview_settings.

v4

09 May 22:17
Compare
Choose a tag to compare
v4

(now defunct)

Much faster Step 1 because now I initialize the files needed for gfpgan and ESRGAN without processing wav2lip itself.

Now Allows for any filetypes supported by ffmpeg instead of just .mp4 and .wav
-This took a while because that broke batch processing so I had to fix that. I also got 2 kittens irl 🐱🐱

Batch processing now supports processing multiple audio files for one image/video or multiple images/videos for one audio file.

Easier readability because I moved most of the explanations to the readme.

Added the possibility to specify ESRGAN upscaler, I haven't really tested this though so please leave a comment if you find something that works!