Smart Home Voice Command Recognition

Smart home controller simulator, receiving voice commands from a microphone.
Trained to detect the words: "vrata", "svetlo", "zvuk", "otvori", "zatvori", "uključi" and "isključi" to control the state of door, lights and audio in a smart home system.

❓ How to Run

Online

Visit: https://smart-home-serbian-voice-controller.streamlit.app

Locally (faster)

Clone the repository:

git clone https://github.com/mradovic38/voice-command-recognition

Install requiered dependencies:

pip install -r requirements.txt

Use cache
In run.py pass the argument use_cache=True in the constructor of the GUI class instance to ensure better processing speed.
Run the following command to start the program:

streamlit run run.py

🤖 Augmentation and Preprocessing

Since the dataset is relatively small, audio augmentation techniques were performed to expand the training dataset size. In this case the training dataset size was doubled. Three different augmentations were perfomed randomly:

Adding noise
Time Stretching
Pitch shifting The augmentations were performed using the class AudioAugmentation, only on the training dataset to ensure valid evaluation.

🗣️ Fine-Tuning Wav2Vec2

Wav2Vec2 model for cross-lingual speech representations (Wav2Vec2-XLSR-53) was fine-tuned for this problem, since our smart home commands are in Serbian language.

To ensure proper evaluation, training examples and validation examples contain audio recordings of different speakers. If a speaker's voice is in both training and validation datasets, the validation would not correctly evaluate the model, resulting in poor performance.

The model was fine-tuned for 100 epochs with batch size of 8 since the dataset is relatively small. Increasing dropout yields better performance in this case as well, due to dataset size.

Fine-tuned model is available on Hugging Face 🤗 on the following link:
wav2vec2-large-xlsr-53-serbian-smart-home-commands

Figure 1: Training loss over time.

Figure 2: Validation loss over time.

Figure 2: Validation WER over time.

🔇 Out-Of-Vocabulary Detection

Since the dataset contains only the words, we do not have any way to detect words that are out of the vocabulary. That's why OOVHandler class is introduced. Here the minimum distance from each of the words from the dataset is being calculated using Dynamic time warping (DTW). If that distance exceeds a given threshold, we label the word as out of the vocabulary (method check_if_oov() returns false). To perform DTW, we need to extract audio features. In this case, Mel-frequency cepstrum coefficients (MFCC) features were extracted, with delta and delta2 features was used.

🔠 Postprocessing

Sometimes, the model predicts the word that is very close to one of the words in the vocabulary (e.g. "uključi" is sometimes predicted as "uključii"). These close predictions should be mapped to the corresponding exact words. Class TranscriptionPostprocessor performs the mapping if the word is at least 70% near the word from the vocabulary.

💻 GUI

GUI was created using streamlit. It captures a short audio recording of a command when the record button is clicked. If the user said one of the appropriate commands, the state of the images on the screen would change, simulating smart home voice control.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
dataset		dataset
dtw		dtw
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.csv		data.csv
data_augmentation.py		data_augmentation.py
data_collator.py		data_collator.py
data_preprocessor.py		data_preprocessor.py
dataset_generator.py		dataset_generator.py
gui.py		gui.py
model_inference.py		model_inference.py
packages.txt		packages.txt
requirements.txt		requirements.txt
run.py		run.py
transcription_postprocessor.py		transcription_postprocessor.py
utils.py		utils.py
vocab.json		vocab.json
wav2vec2_fine_tuning.ipynb		wav2vec2_fine_tuning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Home Voice Command Recognition

❓ How to Run

Online

Locally (faster)

🤖 Augmentation and Preprocessing

🗣️ Fine-Tuning Wav2Vec2

🔇 Out-Of-Vocabulary Detection

🔠 Postprocessing

💻 GUI

📖 Resources

About

Releases

Packages

Languages

License

mradovic38/voice-command-recognition

Folders and files

Latest commit

History

Repository files navigation

Smart Home Voice Command Recognition

❓ How to Run

Online

Locally (faster)

🤖 Augmentation and Preprocessing

🗣️ Fine-Tuning Wav2Vec2

🔇 Out-Of-Vocabulary Detection

🔠 Postprocessing

💻 GUI

📖 Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages