makes the whole process of collecting,cleaning and sorting datasets alot easier
Supported but not enough Datasets
git clone https://github.com/silenterus/deepspeech-cleaner
cd deepspeech-cleaner
pip install -r requirements.txt
download/analyze/insert all available corpora for french
python3 deepspeech-cleaner.py download --lang fr
insert corpora - in case you download the files by yourself
python3 deepspeech-cleaner.py insert /path/to/corpora/
clean/sort/create all necessary files for training - includes lm.binary/trie if kenlm is installed
python3 deepspeech-cleaner.py create
clean/sort/create all necessary files for training - no cleaning and no lm.binary/trie creation
python3 deepspeech-cleaner.py create --noclean --notrie
start deepspeech training
bash languages/fr/training/standard/start_train.sh
python3 deepspeech-cleaner.py crawl
Test num2words and your replacement rules
python3 deepspeech-cleaner.py test 1 2 3 is not for me
python3 deepspeech-cleaner.py test /path/to/textfile.txt
convert/trimm/trimmsilence all audio files in your Database
python3 deepspeech-cleaner.py convert
all arguments are saved for each language seperately
python3 deepspeech-cleaner.py autosave
only files with a number attached will be used
<0 used before number translation
=>0 used after number translation
replace a word/symbol with '�' and the whole sentence get rejected
spaces at the start/end are important for whole words
change the string based sql querys in
languages/fr/sql_query/..
files are named like the tables in your "audio.db"
'!' at the end of a line functions as NOT
python3 deepspeech-cleaner.py help
<---< samplerate [16000-48000]
>---> corpora [forscher-tuda-vox16-zamia-custom-tatoeba-librivox-cv]
>---> words per sec [2.07]
>---> letters per sec [13.35]
>---> train files [237463]
Test - WER: 0.098498, CER: 3.228931, loss: 23.721140
WER: 3.500000, CER: 37.000000, loss: 326.320953
src: “eine neue”
res: “einem neuen leben und neuen pflichten entgegen”
WER: 3.000000, CER: 6.000000, loss: 7.963222
src: “ausverkauft”
res: “aus der fast”
WER: 3.000000, CER: 5.000000, loss: 11.577581
src: “riesengebirge”
res: “riesen der berge”
WER: 3.000000, CER: 6.000000, loss: 11.873451
src: “beerdigung”
res: “wer die un”
WER: 3.000000, CER: 8.000000, loss: 17.944910
src: “besuchstermin”
res: “es wuchs der”
WER: 3.000000, CER: 6.000000, loss: 22.410923
src: “beerdigung”
res: “wer die un”
WER: 3.000000, CER: 4.000000, loss: 25.310646
src: “weitermachen”
res: “bei der machen”
WER: 3.000000, CER: 34.000000, loss: 237.857559
src: “misses dent”
res: “es ist mein wunsch vergessen vernachlässigt”
WER: 3.000000, CER: 74.000000, loss: 484.282074
src: “es endigte mit einem”
res: “es endigte mit einem lauten schall welcher in jedem einsamen zimmer in echo zu wecken schienen”
WER: 2.800000, CER: 69.000000, loss: 650.892578
src: “computer alarm in neun minuten”
res: “per definition handelt es sich bei diesen geräten im engeren sinn um personal computer”
<---> samplerate [16000-48000]
<---> corpora [librivox-tatoeba]
<---> words per sec [2.25]
<---> letters per sec [14.53]
<---> train files [18134]
I Test of Epoch 12 - WER: 0.137465, loss: 29.99004187996005, mean edit distance: 0.058884
I WER: 0.142857, loss: 4.163468, mean edit distance: 0.065217
I - src: "jak w ogóle we wszystkich naszych obliczeniach"
I - res: "a w ogóle we wszystkich naszych obliczeniach "
I WER: 0.142857, loss: 4.163468, mean edit distance: 0.065217
I - src: "jak w ogóle we wszystkich naszych obliczeniach"
I - res: "a w ogóle we wszystkich naszych obliczeniach "
I WER: 0.181818, loss: 6.447145, mean edit distance: 0.025641
I - src: "pomimoto w stosunku wokulskiego do panny izabeli pierwsze lody były przełamane"
I - res: "pomimo to w stosunku wokulskiego do panny izabeli pierwsze lody były przełamane "
I WER: 0.400000, loss: 6.677766, mean edit distance: 0.107143
I - src: "otarła oczy i ciągnęła dalej"
I - res: "otarołaoczy i ciągnęła dalej "
I WER: 0.400000, loss: 6.677766, mean edit distance: 0.107143
I - src: "otarła oczy i ciągnęła dalej"
I - res: "otarołaoczy i ciągnęła dalej "
I WER: 0.500000, loss: 1.875308, mean edit distance: 0.105263
I - src: "niedziela sprowadzą"
I - res: "niedziela prowadzą "
I WER: 0.500000, loss: 1.875308, mean edit distance: 0.105263
I - src: "niedziela sprowadzą"
I - res: "niedziela prowadzą "
I WER: 1.000000, loss: 3.942765, mean edit distance: 0.105263
I - src: "tu będzie licytacya"
I - res: "tubędzielicytacya"
I WER: 1.000000, loss: 3.942765, mean edit distance: 0.105263
I - src: "tu będzie licytacya"
I - res: "tubędzielicytacya"
I WER: 1.000000, loss: 6.762781, mean edit distance: 0.176471
I - src: "jakto z kucharzem"
I - res: "jak to skucharzem"
<---> corpora [librivox-vox-tatoeba]
<---> samplerate [16000-48000]
<---> words per sec [2.26]
<---> letters per sec [12.85]
<---> train files [97486]
I Test of Epoch 12 - WER: 0.139222, loss: 16.857607432188242, mean edit distance: 0.060826
I WER: 0.250000, loss: 0.047055, mean edit distance: 0.047619
I - src: "tengo que comprar uno"
I - res: "tengo que comprar un "
I WER: 0.500000, loss: 0.039710, mean edit distance: 0.083333
I WER: 0.500000, loss: 0.072996, mean edit distance: 0.111111
I WER: 0.500000, loss: 0.072996, mean edit distance: 0.111111
I WER: 0.500000, loss: 0.098463, mean edit distance: 0.071429
I - src: "cuándo termina"
I - res: "cuando termina"
I WER: 1.000000, loss: 0.027957, mean edit distance: 0.100000
I WER: 1.000000, loss: 0.089742, mean edit distance: 0.125000
I WER: 1.000000, loss: 0.092845, mean edit distance: 0.100000
I WER: 1.000000, loss: 0.092845, mean edit distance: 0.100000
I WER: 1.000000, loss: 0.099211, mean edit distance: 0.076923
<--->samplerate [16000-48000]
<---> corpora [librivox-tatoeba-vox16-accent]
<---> words per sec [2.37]
<---> letters per sec [14.41]
<---> train files [87938]
I Test of Epoch 11 - WER: 0.227659, loss: 38.279466658148145, mean edit distance: 0.123504
I WER: 0.333333, loss: 0.538573, mean edit distance: 0.166667
I WER: 0.333333, loss: 0.656955, mean edit distance: 0.166667
I WER: 0.333333, loss: 0.885854, mean edit distance: 0.062500
I - src: "nous avons gagné"
I - res: "nous avons gagne"
I WER: 0.333333, loss: 0.885854, mean edit distance: 0.062500
I - src: "nous avons gagné"
I - res: "nous avons gagne"
I WER: 0.500000, loss: 0.314220, mean edit distance: 0.333333
I WER: 1.000000, loss: 0.245572, mean edit distance: 1.000000
I WER: 1.000000, loss: 0.448257, mean edit distance: 1.000000
I WER: 1.000000, loss: 0.448257, mean edit distance: 1.000000
I WER: 1.000000, loss: 0.628055, mean edit distance: 0.333333
I WER: 1.000000, loss: 0.628055, mean edit distance: 0.333333
<---> corpora [librivox-vox-tatoeba]
<---> samplerate [16000-48000]
<---> words per sec [2.17]
<---> letters per sec [12.83]
<---> train files [58304]
I Test of Epoch 10 - WER: 0.184894, loss: 28.62499210021505, mean edit distance: 0.075463
I WER: 0.083333, loss: 1.599633, mean edit distance: 0.029851
I - src: "cosí riflettendo su le sue sciagure bruno celèsia si ridusse a casa"
I - res: "così riflettendo su le sue sciagure bruno celèsia si ridusse a casa "
I WER: 0.090909, loss: 1.664164, mean edit distance: 0.033333
I - src: "abbiamo forse fatto male no niente di male rispose il medico"
I - res: "abbiamo forse fatto male no niente di male rispose il medio "
I WER: 0.100000, loss: 1.168548, mean edit distance: 0.033898
I - src: "perchè vedete signora voi siete stata la pietra di paragone"
I - res: "perché vedete signora voi siete stata la pietra di paragone "
I WER: 0.100000, loss: 1.493682, mean edit distance: 0.016129
I - src: "state zitto avaraccio gridò carmaux che slegava il povero uomo"
I - res: "state zitto avaraccio gridò carmaux che slegava il povero uuomo"
I WER: 0.100000, loss: 1.706887, mean edit distance: 0.040816
I - src: "oh esclamò in quel momento toby che si era levato"
I - res: "o esclamò in quel momento toby che si era levato "
I WER: 0.142857, loss: 0.449785, mean edit distance: 0.046512
I - src: "giunsi al paese senza averne fissato alcuno"
I - res: "giunse al paese senza averne fissato alcuno "
I WER: 0.142857, loss: 1.841321, mean edit distance: 0.058824
I - src: "le ricerche durarono più d un mese"
I - res: "le ricerche durarono più di un mese "
I WER: 0.200000, loss: 0.612865, mean edit distance: 0.083333
I - src: "ah e quale filippo ferri"
I - res: "a e quale filippo ferri "
I WER: 0.200000, loss: 0.969935, mean edit distance: 0.086957
I - src: "entrai in un altra sala"
I - res: "entra in un altra sala "
I WER: 0.200000, loss: 0.969935, mean edit distance: 0.086957
I - src: "entrai in un altra sala"
I - res: "entra in un altra sala "
<---> corpora [librivox-vox-tatoeba]
<---> samplerate [16000-48000]
<---> words per sec [1.94]
<---> letters per sec [11.66]
<---> train files [22351]
I Test of Epoch 10 - WER: 0.299552, loss: 41.175528268814084, mean edit distance: 0.117625
I WER: 0.250000, loss: 1.425027, mean edit distance: 0.117647
I - src: "але як се зробити"
I - res: "але як це зробити "
I WER: 0.250000, loss: 1.425027, mean edit distance: 0.117647
I - src: "але як се зробити"
I - res: "але як це зробити "
I WER: 0.285714, loss: 2.314395, mean edit distance: 0.066667
I - src: "тож до тебе я зверну свою мову"
I - res: "то ж до тебе я зверну свою мову "
I WER: 0.285714, loss: 2.314395, mean edit distance: 0.066667
I - src: "тож до тебе я зверну свою мову"
I - res: "то ж до тебе я зверну свою мову "
I WER: 0.333333, loss: 2.467164, mean edit distance: 0.250000
I WER: 0.333333, loss: 2.467164, mean edit distance: 0.250000
I WER: 0.500000, loss: 2.119555, mean edit distance: 0.142857
I WER: 0.500000, loss: 2.119555, mean edit distance: 0.142857
I WER: 1.000000, loss: 0.684362, mean edit distance: 0.333333
I WER: 1.000000, loss: 0.684362, mean edit distance: 0.333333
<---> samplerate [16000-48000]
<---> words per sec [1.91]
<---> letters per sec [11.79]
<---> train files [20360]
I Test of Epoch 12 - WER: 0.369255, loss: 49.01442650910262, mean edit distance: 0.155081
I WER: 0.500000, loss: 0.076582, mean edit distance: 0.200000
I WER: 0.500000, loss: 0.076582, mean edit distance: 0.200000
I WER: 0.500000, loss: 0.199971, mean edit distance: 0.166667
I WER: 0.500000, loss: 0.199971, mean edit distance: 0.166667
I WER: 0.500000, loss: 0.276903, mean edit distance: 0.200000
I WER: 0.500000, loss: 0.276903, mean edit distance: 0.200000
I WER: 0.500000, loss: 0.312152, mean edit distance: 0.142857
I WER: 0.500000, loss: 0.312152, mean edit distance: 0.142857
I WER: 0.500000, loss: 0.868555, mean edit distance: 0.285714
I WER: 0.500000, loss: 0.868555, mean edit distance: 0.285714
<---> corpora [swc-vox-tatoeba]
<---> samplerate [16000-48000]
<---> words per sec [2.22]
<---> letters per sec [13.9]
<---> train files [30598]
I Test of Epoch 9 - WER: 0.396161, loss: 92.96824162893921, mean edit distance: 0.193605
I WER: 0.083333, loss: 3.263168, mean edit distance: 0.014706
I - src: "de buurtschap ligt ten zuiden van dasselaar en ten westen van norden"
I - res: "de buurtschap ligt ten zuiden van dasselaar en ten westen van noorden"
I WER: 0.125000, loss: 3.376268, mean edit distance: 0.026316
I - src: "het is een restant van de oude zeedijk"
I - res: "het is een restant van de oude zeedik"
I WER: 0.142857, loss: 2.820412, mean edit distance: 0.025000
I - src: "de herkomst van dit wapen is onduidelijk"
I - res: "de herkomst van dit wapen is onduidenlijk"
I WER: 0.142857, loss: 3.029150, mean edit distance: 0.028571
I - src: "het ligt iets ten noorden van gendt"
I - res: "het ligt iets ten noorden van gent"
I WER: 0.142857, loss: 3.029150, mean edit distance: 0.028571
I - src: "het ligt iets ten noorden van gendt"
I - res: "het ligt iets ten noorden van gent"
I WER: 0.142857, loss: 3.058265, mean edit distance: 0.025641
I - src: "bij het buurtje lag een wierde die in de negentiende eeuw geheel is afgegraven"
I - res: "bij het buurtje lag een wierde die in de negentien e eeuw geheel is afgegraven "
I WER: 0.222222, loss: 2.067109, mean edit distance: 0.023256
I - src: "het dorp ligt op de rechteroever van de lek"
I - res: "het dorp ligt op de rechter oever van de lek"
I WER: 0.285714, loss: 1.334180, mean edit distance: 0.025000
I - src: "het dorp ontstond in de negentiende eeuw"
I - res: "het dorp ontstond in de negentien e eeuw"
I WER: 0.333333, loss: 2.122649, mean edit distance: 0.017857
I - src: "in duizendzeshonderdeenenvijftig wordt een sluis gebouwd"
I - res: "in duizendzeshonderdeenenvijftig wordt een sluisgebouwd"
I WER: 0.333333, loss: 2.648912, mean edit distance: 0.026316
I - src: "hier wordt lesgegeven aan de onderbouw"
I - res: "hier wordt les gegeven aan de onderbouw"
<---> samplerate [16000-48000]
<---> corpora [tatoeba-vox16]
<---> words per sec [1.88]
<---> letters per sec [9.98]
I Test of Epoch 10 - WER: 0.507568, loss: 21.292116564373636, mean edit distance: 0.244271
I WER: 0.200000, loss: 1.065989, mean edit distance: 0.058824
I - src: "não foi tom não é"
I - res: "não foi tom não "
I WER: 0.250000, loss: 1.081908, mean edit distance: 0.200000
I - src: "tom não tem pai"
I WER: 0.250000, loss: 1.081908, mean edit distance: 0.200000
I - src: "tom não tem pai"
I WER: 0.333333, loss: 1.577532, mean edit distance: 0.083333
I WER: 0.333333, loss: 1.577532, mean edit distance: 0.083333
I WER: 0.500000, loss: 1.114254, mean edit distance: 0.083333
I WER: 0.500000, loss: 1.114254, mean edit distance: 0.083333
I WER: 0.500000, loss: 1.841137, mean edit distance: 0.333333
I WER: 0.500000, loss: 1.879081, mean edit distance: 0.100000
I WER: 0.500000, loss: 1.879081, mean edit distance: 0.100000