SABER - Semi-Supervised Audio Baseline for Easy Reproduction

A PyTorch project currently under research can provide easily reproducible baselines for automatic speech recognition using semi-supervised learning. Contributions are welcome.

Overview

SABER consists of the following components

Several SOTA models including an Mixnet based variant of QuartzNet by NVIDIA.
Ranger (RAdam + Lookahead) optimizer to offset warmup used by SpecAugment (by Leslie Smith)
Mish activation function
Data Augmentions used are SpecNoise, SpecAugment, SpecSparkle a cutout inspired variant, SpecBlur (a novel approach). Augmentation parameters linearly increase in a curriculum based approach.
Aggregated Cross Entropy loss instead of CTC loss for easier training
Unsupervised Data Augmentation as means for Semi-Supervised Learning

Requirements

ariar2c
python3.x
libraries in requirements.txt

Download & Setup

Librispeech & CommonVoice datasets using download scripts, change dir parameter as per your configuration

sh download_scripts/download_librispeech.sh
sh download_scripts/extract_librispeech_tars.sh
sh download_scripts/download_common_voice.sh
sh download_scripts/extract_common_voice_tars.sh

Setup sentencepeiece vocab & form LMDB dataset.

sh dataset_scripts/librispeech_all_lines.sh
sh dataset_scripts/librispeech_sentencepiece_model.sh
OMP_NUM_THREADS="1" OPENBLAS_NUM_THREADS="1" python3 -W ignore -m dataset_scripts.create_librispeech_lmdb
OMP_NUM_THREADS="1" OPENBLAS_NUM_THREADS="1" python3 -W ignore -m dataset_scripts.create_commonvoice_lmdb
OMP_NUM_THREADS="1" OPENBLAS_NUM_THREADS="1" python3 -W ignore -m dataset_scripts.create_airtel_lmdb
OMP_NUM_THREADS="1" OPENBLAS_NUM_THREADS="1" python3 -W ignore -m dataset_scripts.create_airtelpayments_lmdb

Training

Modify utils/config.py as per your configuration and run

OMP_NUM_THREADS="1" CUDA_VISIBLE_DEVICES="0,1,2" python3.6 train.py

References

Papers

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Jasper: An End-to-End Convolutional Neural Acoustic Model

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Improved Regularization of Convolutional Neural Networks with Cutout

On the Variance of the Adaptive Learning Rate and Beyond

Aggregation Cross-Entropy for Sequence Recognition

MixMatch: A Holistic Approach to Semi-Supervised Learning

MixConv: Mixed Depthwise Convolutional Kernels

Unsupervised Data Augmentation for Consistency Training

Cyclical Learning Rates for Training Neural Networks

Cycle-consistency training for end-to-end speech recognition

RandAugment: Practical data augmentation with no separate search

Self-Attention Networks For Connectionist Temporal Classification in Speech Recognition

Codebases

DeepSpeech2

NVIDIA Neural Modules: NeMo

RAdam

LR-Finder

Cyclical Learning Rate Scheduler With Decay in Pytorch

MixNet

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github		.github
age_scripts		age_scripts
dataset_scripts		dataset_scripts
datasets		datasets
debug		debug
download_scripts		download_scripts
icons		icons
models		models
samplers		samplers
transforms		transforms
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_single_batch.py		check_single_batch.py
exclude.txt		exclude.txt
requirements.txt		requirements.txt
server.py		server.py
single_batch_overfit.sh		single_batch_overfit.sh
sync.sh		sync.sh
train.py		train.py
train.sh		train.sh
train_ddp.py		train_ddp.py
train_ddp.sh		train_ddp.sh
validate.py		validate.py
validate.sh		validate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SABER - Semi-Supervised Audio Baseline for Easy Reproduction

Overview

Requirements

Download & Setup

Training

References

Papers

Codebases

About

Releases

Sponsor this project

Packages

Languages

License

SABER-labs/SABER

Folders and files

Latest commit

History

Repository files navigation

SABER - Semi-Supervised Audio Baseline for Easy Reproduction

Overview

Requirements

Download & Setup

Training

References

Papers

Codebases

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages