Skip to content

Latest commit

 

History

History

eng-tut

opus-2020-07-06.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum mon nog ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn xal
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-06.zip
  • test set translations: opus-2020-07-06.test.txt
  • test set scores: opus-2020-07-06.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-aze.eng.aze 25.6 0.559
Tatoeba-test.eng-bak.eng.bak 9.7 0.327
Tatoeba-test.eng-chv.eng.chv 3.4 0.281
Tatoeba-test.eng-crh.eng.crh 13.7 0.326
Tatoeba-test.eng-kaz.eng.kaz 10.3 0.351
Tatoeba-test.eng-kir.eng.kir 18.0 0.464
Tatoeba-test.eng-kjh.eng.kjh 1.7 0.030
Tatoeba-test.eng-kum.eng.kum 1.7 0.024
Tatoeba-test.eng-mon.eng.mon 9.8 0.358
Tatoeba-test.eng.multi 17.5 0.431
Tatoeba-test.eng-nog.eng.nog 1.1 0.057
Tatoeba-test.eng-ota.eng.ota 0.3 0.043
Tatoeba-test.eng-sah.eng.sah 0.5 0.040
Tatoeba-test.eng-tat.eng.tat 9.4 0.295
Tatoeba-test.eng-tuk.eng.tuk 6.1 0.315
Tatoeba-test.eng-tur.eng.tur 31.6 0.600
Tatoeba-test.eng-tyv.eng.tyv 6.9 0.201
Tatoeba-test.eng-uig.eng.uig 0.1 0.148
Tatoeba-test.eng-uzb.eng.uzb 2.8 0.261
Tatoeba-test.eng-xal.eng.xal 0.1 0.040

opus-2020-07-14.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum mon nog ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn xal
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-14.zip
  • test set translations: opus-2020-07-14.test.txt
  • test set scores: opus-2020-07-14.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-aze.eng.aze 26.6 0.570
Tatoeba-test.eng-bak.eng.bak 6.7 0.293
Tatoeba-test.eng-chv.eng.chv 3.3 0.288
Tatoeba-test.eng-crh.eng.crh 7.9 0.364
Tatoeba-test.eng-kaz.eng.kaz 11.9 0.361
Tatoeba-test.eng-kir.eng.kir 22.4 0.468
Tatoeba-test.eng-kjh.eng.kjh 1.7 0.028
Tatoeba-test.eng-kum.eng.kum 2.0 0.076
Tatoeba-test.eng-mon.eng.mon 11.6 0.369
Tatoeba-test.eng.multi 18.2 0.439
Tatoeba-test.eng-nog.eng.nog 1.2 0.066
Tatoeba-test.eng-ota.eng.ota 0.2 0.039
Tatoeba-test.eng-sah.eng.sah 0.7 0.046
Tatoeba-test.eng-tat.eng.tat 10.2 0.302
Tatoeba-test.eng-tuk.eng.tuk 5.3 0.313
Tatoeba-test.eng-tur.eng.tur 32.9 0.611
Tatoeba-test.eng-tyv.eng.tyv 5.2 0.170
Tatoeba-test.eng-uig.eng.uig 0.1 0.151
Tatoeba-test.eng-uzb.eng.uzb 3.1 0.268
Tatoeba-test.eng-xal.eng.xal 0.1 0.049

opus-2020-07-20.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum mon nog ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn xal
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-20.zip
  • test set translations: opus-2020-07-20.test.txt
  • test set scores: opus-2020-07-20.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-aze.eng.aze 26.5 0.569
Tatoeba-test.eng-bak.eng.bak 5.4 0.274
Tatoeba-test.eng-chv.eng.chv 3.3 0.280
Tatoeba-test.eng-crh.eng.crh 12.5 0.384
Tatoeba-test.eng-kaz.eng.kaz 10.9 0.359
Tatoeba-test.eng-kir.eng.kir 25.6 0.501
Tatoeba-test.eng-kjh.eng.kjh 2.4 0.046
Tatoeba-test.eng-kum.eng.kum 7.0 0.143
Tatoeba-test.eng-mon.eng.mon 10.1 0.359
Tatoeba-test.eng.multi 18.4 0.441
Tatoeba-test.eng-nog.eng.nog 1.3 0.066
Tatoeba-test.eng-ota.eng.ota 0.3 0.034
Tatoeba-test.eng-sah.eng.sah 0.8 0.054
Tatoeba-test.eng-tat.eng.tat 9.7 0.303
Tatoeba-test.eng-tuk.eng.tuk 5.8 0.313
Tatoeba-test.eng-tur.eng.tur 33.2 0.616
Tatoeba-test.eng-tyv.eng.tyv 6.9 0.189
Tatoeba-test.eng-uig.eng.uig 0.1 0.151
Tatoeba-test.eng-uzb.eng.uzb 3.1 0.283
Tatoeba-test.eng-xal.eng.xal 0.1 0.058

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum mon nog ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn xal
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-entr-engtur.eng.tur 9.6 0.427
newstest2016-entr-engtur.eng.tur 8.4 0.402
newstest2017-entr-engtur.eng.tur 8.6 0.402
newstest2018-entr-engtur.eng.tur 8.6 0.404
Tatoeba-test.eng-aze.eng.aze 27.5 0.575
Tatoeba-test.eng-bak.eng.bak 5.5 0.306
Tatoeba-test.eng-chv.eng.chv 3.3 0.284
Tatoeba-test.eng-crh.eng.crh 11.9 0.358
Tatoeba-test.eng-kaz.eng.kaz 12.0 0.366
Tatoeba-test.eng-kir.eng.kir 24.6 0.493
Tatoeba-test.eng-kjh.eng.kjh 2.2 0.049
Tatoeba-test.eng-kum.eng.kum 8.4 0.176
Tatoeba-test.eng-mon.eng.mon 9.8 0.359
Tatoeba-test.eng.multi 18.6 0.441
Tatoeba-test.eng-nog.eng.nog 1.6 0.079
Tatoeba-test.eng-ota.eng.ota 0.3 0.035
Tatoeba-test.eng-sah.eng.sah 0.8 0.061
Tatoeba-test.eng-tat.eng.tat 10.1 0.308
Tatoeba-test.eng-tuk.eng.tuk 5.7 0.310
Tatoeba-test.eng-tur.eng.tur 33.2 0.616
Tatoeba-test.eng-tyv.eng.tyv 6.6 0.184
Tatoeba-test.eng-uig.eng.uig 0.1 0.151
Tatoeba-test.eng-uzb.eng.uzb 3.9 0.286
Tatoeba-test.eng-xal.eng.xal 0.1 0.057

opus2m-2020-08-02.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum mon nog ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn xal
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-02.zip
  • test set translations: opus2m-2020-08-02.test.txt
  • test set scores: opus2m-2020-08-02.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-entr-engtur.eng.tur 10.4 0.438
newstest2016-entr-engtur.eng.tur 9.1 0.414
newstest2017-entr-engtur.eng.tur 9.5 0.414
newstest2018-entr-engtur.eng.tur 9.5 0.415
Tatoeba-test.eng-aze.eng.aze 27.2 0.580
Tatoeba-test.eng-bak.eng.bak 5.8 0.298
Tatoeba-test.eng-chv.eng.chv 4.6 0.301
Tatoeba-test.eng-crh.eng.crh 6.5 0.342
Tatoeba-test.eng-kaz.eng.kaz 11.8 0.360
Tatoeba-test.eng-kir.eng.kir 24.6 0.499
Tatoeba-test.eng-kjh.eng.kjh 2.2 0.052
Tatoeba-test.eng-kum.eng.kum 8.0 0.229
Tatoeba-test.eng-mon.eng.mon 10.3 0.362
Tatoeba-test.eng.multi 19.5 0.451
Tatoeba-test.eng-nog.eng.nog 1.5 0.117
Tatoeba-test.eng-ota.eng.ota 0.2 0.035
Tatoeba-test.eng-sah.eng.sah 0.7 0.080
Tatoeba-test.eng-tat.eng.tat 10.8 0.320
Tatoeba-test.eng-tuk.eng.tuk 5.6 0.323
Tatoeba-test.eng-tur.eng.tur 34.2 0.623
Tatoeba-test.eng-tyv.eng.tyv 8.1 0.192
Tatoeba-test.eng-uig.eng.uig 0.1 0.158
Tatoeba-test.eng-uzb.eng.uzb 4.2 0.298
Tatoeba-test.eng-xal.eng.xal 0.1 0.061