[Idea] Compute key similarity over the log-scale Mel spectrogram #49

ggerganov · 2022-10-02T09:03:03Z

Currently, we compute the cross-correlation between time-domain key waveforms to determine how similar 2 keys are.
Instead, we can compute the similarity metric over the Mel spectrograms of the signals. The Mel spectrogram seems to be the go-to choice for audio representation in modern state-of-the-art speech recognition algorithms, so why not give it a try in keytap.

Here is a sample implementation to compute the log-scaled Mel spectrogram of an audio, that I recently did for the whisper.cpp project:

https://github.com/ggerganov/whisper.cpp/blob/6d654d192a62e6cd9897d6ff683bdc97406827e9/main.cpp#L1962-L2063

The text was updated successfully, but these errors were encountered:

ggerganov added the enhancement New feature or request label Oct 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Idea] Compute key similarity over the log-scale Mel spectrogram #49

[Idea] Compute key similarity over the log-scale Mel spectrogram #49

ggerganov commented Oct 2, 2022

[Idea] Compute key similarity over the log-scale Mel spectrogram #49

[Idea] Compute key similarity over the log-scale Mel spectrogram #49

Comments

ggerganov commented Oct 2, 2022