You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we compute the cross-correlation between time-domain key waveforms to determine how similar 2 keys are.
Instead, we can compute the similarity metric over the Mel spectrograms of the signals. The Mel spectrogram seems to be the go-to choice for audio representation in modern state-of-the-art speech recognition algorithms, so why not give it a try in keytap.
Here is a sample implementation to compute the log-scaled Mel spectrogram of an audio, that I recently did for the whisper.cpp project:
Currently, we compute the cross-correlation between time-domain key waveforms to determine how similar 2 keys are.
Instead, we can compute the similarity metric over the Mel spectrograms of the signals. The Mel spectrogram seems to be the go-to choice for audio representation in modern state-of-the-art speech recognition algorithms, so why not give it a try in keytap.
Here is a sample implementation to compute the log-scaled Mel spectrogram of an audio, that I recently did for the
whisper.cpp
project:https://github.com/ggerganov/whisper.cpp/blob/6d654d192a62e6cd9897d6ff683bdc97406827e9/main.cpp#L1962-L2063
The text was updated successfully, but these errors were encountered: