Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Ensemble (multi-model combination) mode #12

Open
c469591 opened this issue Sep 1, 2023 · 20 comments
Open

Feature request: Ensemble (multi-model combination) mode #12

c469591 opened this issue Sep 1, 2023 · 20 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@c469591
Copy link

c469591 commented Sep 1, 2023

Hello,
is it possible to add the functionality of combining multiple models,
similar to UVR's Ensemble Mode? And can we specify the way of combination,
like choosing Min Spec, Max Spec, Average in UVR? Thank you.

@hijaek
Copy link

hijaek commented Sep 11, 2023

I was looking for the same

@beveradb
Copy link
Collaborator

beveradb commented Sep 12, 2023

It's certainly possible!

I'm personally not keen on diving back into the UVR code again any time soon to figure out how those features are implemented, but PRs are very much welcome on this repo and I'd happily pair with anyone interested to help them get up to speed with it :)

Most of the core logic in this project was cherry picked from https://github.com/Anjok07/ultimatevocalremovergui/blob/master/separate.py

@c469591
Copy link
Author

c469591 commented Sep 12, 2023

Hi,
I noticed that currently only the MD model is supported. Is it possible to add the VR model? The VR model for noise reduction is very useful. Thank you!

@beveradb
Copy link
Collaborator

Anyone is welcome to submit pull requests to this repo :)

@c469591
Copy link
Author

c469591 commented Sep 14, 2023

Thank you.

@beveradb beveradb added the enhancement New feature or request label Sep 22, 2023
@beveradb beveradb added the help wanted Extra attention is needed label Dec 21, 2023
@beveradb
Copy link
Collaborator

beveradb commented Feb 5, 2024

Hey folks, FYI I've been working on adding support for VR models this week, and I released audio-separator version 0.14 earlier today with initial support for VR models!

Please give it a try and see if it works for you!

I'm still working on documentation, tests and some packaging issues but the package on PyPI should "just work".

There's a new CLI parameter audio-separator --list_models which just prints all the models which are supported out of the box, and the interface has changed slightly (you now specify model filename with extension too).

I will inevitably be working on "ensemble mode" and model chaining functionality later this month, as I've been contracted to add support for stem splitting (which kinda goes hand in hand with that).

That said, it's already pretty easy to use audio-separator with multiple models in a row as the output filenames are consistent so you can easily script it to process a file with one model after another!

@c469591
Copy link
Author

c469591 commented Feb 5, 2024

Hello, I am very diligent and excited that now we can also use the CLI version of the VR model. Thank you so much.
I was wondering if it would be possible to add a synthesis feature in the future, similar to UVR, which can merge multiple documents processed from different models. This could greatly enhance the sound quality of the extracted files.

@beveradb
Copy link
Collaborator

beveradb commented Feb 5, 2024

Yes, I plan to implement that - hopefully later this month! 😄

@c469591
Copy link
Author

c469591 commented Feb 5, 2024

Thank you! I'd like to ask a question that's been asked many times before: does MDX now support the 23c model?

@beveradb
Copy link
Collaborator

beveradb commented Feb 5, 2024

Thank you! I'd like to ask a question that's been asked many times before: does MDX now support the 23c model?

I'm afraid not quite yet (that's still on my TODO list), but it's not far away now; I intend to implement that later this week or next!

@beveradb
Copy link
Collaborator

Hey @c469591, the latest version of audio-separator now supports MDX, VR and Demucs models.
I haven't yet finished implementation of the checkpoint models (MDX23C) but I plan to add that later this week.

I'm actually not very familiar with the ensemble mode in UVR; I'll try and dig into it and understand exactly what it's doing later this week too.
However, would you be able to explain what it does from your perspective, or provide any example audio files where it produces better results than a single model? Seeing great results from something motivates me to implement it!

Thank you!

@beveradb beveradb changed the title Function request, can the multi-model combination function be added? Feature request: Ensemble (multi-model combination) mode Feb 26, 2024
@c469591
Copy link
Author

c469591 commented Feb 27, 2024

Hello,
Based on my years of experience using UVR, the ensemble mode roughly works like this; it consists of several steps.
The first step is to run each selected model individually.
For example, if I have chosen the 23c from MDX and the 5_HP-Karaoke-UVR from VR, UVR will first run 23c to generate separate accompaniment and vocal files, then it will run 5_HP-Karaoke-UVR to produce another set of accompaniment and vocal files.
Next, all the accompaniment tracks are merged into one file using a method that I'm not aware of; similarly for the vocal files.
I speculate that it might use some strategy like audio phase cancellation to nullify identical sounds across multiple tracks while merging different sounds together—though this is just an unfounded guess.
In the end, after merging, you get an accompaniment with harmonies because 5_HP-Karaoke-UVR includes harmonies.
Additionally, since 23c processes richer instrumental details in its accompaniments, you end up with a result that's generally better than what you'd get from any single model alone.
Of course, if one of the models didn't completely eliminate vocals from its track those remnants would also be included in the final mixed-down accompaniment file.
That's my understanding of ensemble mode—I hope this helps you!

@beveradb
Copy link
Collaborator

Hey @c469591 , thanks for the write up above, that actually does help me understand the motivation a lot!

I haven't yet gotten around to working on Ensemble mode, but I wanted to give you a heads up that as of version 0.16.2 or higher, audio-separator does now support MDXC models and the VIP models from UVR.

What you've described does actually sound like something I'd like to be able to use myself (I value separation which retains harmonies / backing vocals for the karaoke tracks I make, so far I've mostly been using UVR_MDXNET_KARA_2.onnx on it's own), so I'm motivated to get it working so I too can have that kind of combination of 23c + 5_HP-Karaoke.

I just can't promise when I'll get around to it as my hobby time is limited!

@c469591
Copy link
Author

c469591 commented Mar 15, 2024

hi
@beveradb
I am glad that my sharing has been helpful to you. I look forward to seeing you complete this feature soon. Thank you for your hard work and contribution!

@JackismyShephard
Copy link

@beveradb are there any updates on ensemble mode?

@beveradb
Copy link
Collaborator

Afraid not @JackismyShephard ; to be honest new feature development for audio-separator is something I'm unlikely to be independently motivated to do as my hobby time is limited and I've been pretty happy with my results from audio-separator as it is already for my use case ( https://create.karaokehunt.com )

That said, I would still like to give it a try, I just need a bit of extra help / motivation. If you'd be willing to help / interested in pairing on it some time feel free to email me with a good date/time for a zoom/meet and that'll probably be the thing to get it started 🙏

@JackismyShephard
Copy link

@beveradb Completely understandable. The karaoke app looks interesting.

It might be interesting to work together on the ensemble mode or other features to add to this project, as I see it has a lot of potential. However, I am a bit busy with my own project (https://github.com/JackismyShephard/ultimate-rvc) as well as my day job, so not sure how much time I have left 😄

@beveradb
Copy link
Collaborator

No worries, well feel free to email me at [email protected] if you ever have a little free time and wanna pair on getting ensemble mode working :)

@Bebra777228
Copy link
Contributor

@beveradb, I found an interesting repository that has an ensemble mode, which is enabled by default. In the inference.py file, there is a class called EnsembleDemucsMDXMusicSeparationModel.

It would be great if you could take a look at it and share your thoughts. Later, I will try to integrate this into the audio-separator code, and if everything goes well, I will submit a PR. Either way, please take a look as well; you might figure out how to combine this with audio-separator faster than I can (if it's even possible to integrate at all 😄).

@beveradb
Copy link
Collaborator

Hey, thanks for the heads up @Bebra777228 - that certainly looks promising, more appealing than diving into UVR spaghetti code again for sure 😅
I'm spread really thin right now so probably won't work on integrating that myself soon (even though adding Ensemble logic to audio-separator is definitely appealing to me, it's just low on my priority list for my limited spare time).
If you or someone else manages to get the ball rolling with a draft PR with an attempt at implementing it, that'll probably lead to me prioritizing getting it over the line sooner - especially if you manage to get it to work, at least partly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants