Releases: deiteris/voice-changer
b2332
b2329
Improvements
- Minor overall UI style improvements.
- Merge Lab now downloads merged model instead of saving it into slot.
- Merge Lab now saves merged model in PyTorch format instead of safetensors for compatibility.
- When index is not present for the model, Index slider is disabled.
- Added automatic dark theme. Light and dark theme are selected based on the system or browser preferences.
Fixes
- Fixed an error when model is not selected and "SIO rec." set to "start".
- Fixed an error when attempting to load empty index.
- Server Audio no longer creates a separate thread and app no longer hangs due to running server audio thread.
- Server Audio no longer spams with error messages on invalid devices.
Misc
- Added icon to executable.
- Updated Python to 3.12 for Windows and macOS distributions.
- Updated PyTorch to 2.5.1 for all distributions.
- Updated onnxruntime to 1.20.1 for all distributions.
- Updated torch-directml 0.2.5.dev240914 for DirectML.
- Updated ROCm to 6.2.3 for Linux ROCm distribution.
fixes-refactoring-b2320-d8f2474
Bump python release version
b2309
This is a hotfix for b2307.
Fixes voice changer loading with CUDA version on Windows.
b2307
This is a minor maintenance release.
Changes
- For PyTorch models, PyTorch has been updated to v2.4.0 in the following versions:
- CUDA version. Now also uses CUDA 12+ and CuDNN 9+ which may offer better performance. This has also reduced the prebuilt version size.
- CPU version.
- macOS version (ARM only).
- In DirectML version,
torch-directml
has been updated to 0.2.4.dev240815. No performance change is expected since the new operators are not used by the voice changer. - For ONNX models, ONNX runtime has been updated to 1.19 in the following versions:
- CUDA version. Now also uses CUDA 12+ and CuDNN 9+ which may offer better performance.
- DirectML version. DirectML runtime and opset support have been updated which may offer better performance.
- CPU version.
- macOS version. CoreML runtime now has support for more operators which are used by the voice changer and may offer better performance.
b2300
This is a minor maintenance release.
Fixes FP16 support detection with older NVIDIA GPUs (Maxwell GPUs and earlier).
b2298
Changes
- "export to onnx" button has been replaced with the new "Convert to ONNX" setting in Advanced setting. The behavior of this setting has also changed. When this option is checked, uploaded or selected models (if not converted before) will be automatically converted to ONNX in the same slot. When unchecked, the original PyTorch model will be used. Note that this option does not replace the original model, but adds an ONNX variant to it. If uploaded model is already in ONNX format, it will be loaded as is.
- Performance monitor now shows the model type including the used runtime (f.e., onnxRVC for ONNX RVC models and pyTorchRVCv2 for original RVC v2 models).
Improvements
- WASAPI no longer requires matching the sample rate of input and output devices, the audio will be automatically resampled by the system audio mixer. However, it's still recommended if possible.
Fixes
- ASIO channel selection now correctly selects an input/output channel.
- "Operation in progress" dialog will now appear when changing long-running options in Advanced settings.
- Fixed a potential bug when FP16 ONNX model could fail to load if it was generated previously and removed.
- Model settings (Pitch, Index, etc.) no longer reset to last saved settings after changing GPU.
Experimental
- When using inference on CPU, the contentvec embedder model will be quantized using INT8 precision. This significantly reduces RAM usage and slightly reduces CPU usage. Currently, there's no option to opt out from this behavior.
Miscellaneous
- Updated WebUI npm dependencies.
Known issues
- WDM-KS and true ASIO devices produce crackling audio. For the lowest delay, the current workaround is to use WASAPI or FlexASIO.
b2277
b2271
Important change
The voice changer will now constantly utilize CPU/GPU when voice conversion is enabled. This is to address an issue when the voice changer may lag after a short period of silence or demonstrate inconsistent performance that can be observed with NVIDIA GPUs in previous versions.
New
- "perf" metric now includes a graph. The graph shows data points over last 5 seconds (if chunk size is less than 100ms) or more. Performance graph allows you to see if there're performance fluctuations and adjust chunk size depending on the usage over time.
- Most of the settings now include tooltips with explanation. Just hover over the text with dotted underline.
- When changing settings that may have negative impact on performance in DirectML version, a notification will appear near GPU, asking to switch between CPU and GPU.
- Introduced Just-in-Time (JIT) compilation for PyTorch models. This slightly improves performance (fast response on the first start, a bit lower latency) and reduces memory usage in some cases. However, currently, it increases model loading time. To opt out from JIT compilation, set "Disable JIT compilation" option in Advanced settings to "on". Note that JIT is currently not available for DirectML devices so this option won't have any effect for them.
Changes
- Increased input volume slider range to 250%
Experimental
- Version for NVIDIA includes 2 bat files to lock or reset GPU core and memory clocks to address possible inconsistent performance issues. Note that the support for frequency locking is not guaranteed by all NVIDIA GPUs, you can verify that the option took effect with GPU-Z. Both scripts must run with administrator rights to take effect:
force_gpu_clocks.bat
- queries GPU core and memory clocks and locks to reported maximum clocks.reset_gpu_clocks.bat
- resets GPU core and memory frequencies.
b2245
Changes
-
Changed the background color of performance stats block.
-
buf
metric has been moved toperf
. -
perf
now indicates the performance by highlighting the inference time in green, yellow or red when a certain condition is met. The following table shows an example of 3 conditions that it may indicate:Stable Potentially unstable
/ High usageUnstable The logic of these conditions is the following:
- Stable - your inference speed is sufficient for the selected chunk size. Usually, no actions required.
- Potentially unstable - your inference speed is sufficient but audio may be unstable when other processes run concurrently. Operation in this range will also incur high GPU usage. Increasing Chunk size or reducing Extra is recommended.
- Unstable - your inference speed is insufficient for the selected chunk size. Increase Chunk size or reduce Extra.
Experimental
- In client audio mode, additional audio buffering used to compensate for the lag was removed. This should reduce audio latency in this mode (up to 25% less latency) and not cause any issues during stable operation, but report issues if any.