-
-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rate boost support for SAPI5 voices #17610
base: master
Are you sure you want to change the base?
Conversation
@gexgd0419 do you know if it would be possible to switch eSpeak-ng to use the dynamically linked Sonic as well? |
Sonic itself is not prepared to be exported as a DLL. There isn't any When importing functions from a DLL, typically We can also just use the original header file without Should we change the header file or not? |
Here eSpeak is changed to use To add @SaschaCowley Is this a better way? If so, I will change the development approach above. You can use the original header file without |
See test results for failed build of commit c7d0ad8f83 |
I really like your contributions @gexgd0419! |
@gexgd0419 my main concern was having 2 copies of Sonic in NVDA: a statically linked one for eSpeak, and a dynamically linked one for SAPI5 (and potentially other synths in future). It sounds like maintaining the dllexport and dllimport headers for eSpeak on our side will be unmaintainable, and dynamically linking eSpeak without them will come at a performance penalty, which is the opposite of what we're trying to achieve here. Apart from an increase in built size, is there any disadvantage to having 2 copies of sonic? Is creating the dllexport and dllimports upstream in sonic something you'd be willing to either do, or open an issue for? @seanbudd what are your thoughts here? |
There is a PR in sonic waywardgeek/sonic#27 that does something similar, but it hasn't been merged for years. |
The "performance penalty" for not using If whole-program optimization is enabled, the linker will be able to figure out that the Sonic functions are actually imported from a DLL, and apply the optimization even without # Whole-program optimization causes eSpeak to distort and warble with its Klatt4 voice
# Therefore specifically force it off
"/GL-", I'm not sure what exactly the reason would be, but turning on whole-program optimization can benefit a lot. For example, calls to statically linked Sonic functions can be inlined, and the eSpeak DLL size dropped to 624 KiB even with a statically linked Sonic inside. It's unfortunate that some bugs prevented us from enabling whole-program optimization. Edit: The "disable whole-program optimization" part was actually introduced in cf0443b which was 10 years ago! The issue was caused by MSVC 2012, but now we are using MSVC 2022. It is possible that this issue got fixed later. |
@gexgd0419 You might have missed my question or may be you just didn't have time to answer it. |
@LeonarddeR From my manual tests, it seemed that this rate boost method doesn't affect the accuracy of indexing much. From the Sonic project page:
Here the stream mode is used to minimize latency. According to that, the maximum latency is 31ms, so theoretically the position of an index could be off by at most 31ms. But I couldn't find the difference in index accuracy during testing. (I added a beep sound when a bookmark is reached to test this) Also, in commit cf0443b the whole-program optimization of eSpeak was turned off due to a bug when compiling with MSVC 2012. Could you help me check whether the bug has been fixed? I tried but couldn't find the difference in audio output between whole-program optimization on and off, so maybe it has been fixed already. If this is the case, I can turn the whole-program optimization on, revert the "include |
This reverts commit 7b1c22f.
@SaschaCowley The dllimport part has been reverted. Now sonic will be compiled using its original header, which is fine if #17631 can be merged. |
See test results for failed build of commit cb12a2a861 |
Link to issue number:
Closes #17606
Summary of the issue:
SAPI5 voices do not support rate boosting, but some of the SAPI5 voices are not fast enough even at the highest rate for some experienced users.
Description of user facing changes
The "rate boost" option will be available to users when using SAPI5 voices, which supports rates ranging from 0.5x to 6x.
If rate boost is disabled, the behavior will be the same as before.
Description of development approach
The Sonic library, which is also used by eSpeak NG, is used to change the speed when rate boost is enabled.
When rate boost is enabled, to preserve quality, the SAPI5 voice is set to output at its original speed (1x), and then Sonic is used to change the speed of the original audio. When rate boost is disabled, Sonic is no longer used to change the speed, and the rate of the SAPI5 voice itself is set instead to preserve the previous behavior.
As Sonic is used by eSpeak NG, it has already been included in the NVDA repo as a submodule (
/include/sonic/
). However, in eSpeak NG, it is compiled as a static library, which cannot be easily reused. So some build steps are changed to build Sonic as a DLL,sonic.dll
, instead, which is installed in thesynthDrivers
folder. eSpeak-NG is also changed to dynamically link tosonic.dll
. As importing functions from DLL needs__declspec(dllimport)
, the header filesonic.h
is copied tonvdaHelper/eSpeak
and then have__declspec(dllimport)
added to functions, which replaces the originalsonic.h
file when compiling eSpeak.A new file
_sonic.py
is created insidesynthDrivers
to handle the interoperation withsonic.dll
. There'sinitialize()
to load the Sonic DLL which is called inspeech.initialize()
, and there's a wrapper classSonicStream
for the Sonic stream mode functions.The SAPI5 synthesizer now passes the audio through a
SonicStream
first, before sending the audio to theWavePlayer
. To speed up audio processing in Sonic, which uses 16-bit integer wave format internally, we explicitly choose a 16-bit wave format for the SAPI5 voice and theWavePlayer
to avoid unnecessary format conversion.This is the approach I chose currently. The implementation details are open for discussion. Other ways I've thought of:
WasapiPlayer
before feeding the data to the device. Then add some functions such asgetRate
andsetRate
to theWavePlayer
.Testing strategy:
Seemed to work on my system.
Known issues with pull request:
None
Code Review Checklist:
@coderabbitai summary