- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): akl_Latn ceb hil ilo pag pmn war
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - download: opus-2020-06-28.zip
- test set translations: opus-2020-06-28.test.txt
- test set scores: opus-2020-06-28.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.eng-akl.eng.akl | 6.2 | 0.245 |
Tatoeba-test.eng-ceb.eng.ceb | 10.6 | 0.436 |
Tatoeba-test.eng-hil.eng.hil | 17.1 | 0.490 |
Tatoeba-test.eng-ilo.eng.ilo | 33.9 | 0.587 |
Tatoeba-test.eng.multi | 13.6 | 0.392 |
Tatoeba-test.eng-pag.eng.pag | 16.8 | 0.484 |
Tatoeba-test.eng-pmn.eng.pmn | 0.5 | 0.163 |
Tatoeba-test.eng-war.eng.war | 12.8 | 0.437 |
- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): akl_Latn ceb hil ilo pag war
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - download: opus-2020-07-27.zip
- test set translations: opus-2020-07-27.test.txt
- test set scores: opus-2020-07-27.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.eng-akl.eng.akl | 3.0 | 0.190 |
Tatoeba-test.eng-ceb.eng.ceb | 11.1 | 0.434 |
Tatoeba-test.eng-hil.eng.hil | 18.5 | 0.511 |
Tatoeba-test.eng-ilo.eng.ilo | 32.9 | 0.590 |
Tatoeba-test.eng.multi | 12.8 | 0.391 |
Tatoeba-test.eng-pag.eng.pag | 18.5 | 0.505 |
Tatoeba-test.eng-war.eng.war | 12.5 | 0.437 |
- dataset: opus2m
- model: transformer
- source language(s): eng
- target language(s): akl_Latn ceb hil ilo pag war
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - download: opus2m-2020-08-01.zip
- test set translations: opus2m-2020-08-01.test.txt
- test set scores: opus2m-2020-08-01.eval.txt
testset | BLEU | chr-F |
---|---|---|
Tatoeba-test.eng-akl.eng.akl | 7.1 | 0.245 |
Tatoeba-test.eng-ceb.eng.ceb | 10.5 | 0.435 |
Tatoeba-test.eng-hil.eng.hil | 18.0 | 0.506 |
Tatoeba-test.eng-ilo.eng.ilo | 33.4 | 0.590 |
Tatoeba-test.eng.multi | 13.1 | 0.392 |
Tatoeba-test.eng-pag.eng.pag | 19.4 | 0.481 |
Tatoeba-test.eng-war.eng.war | 12.8 | 0.441 |
- dataset: opus1m+bt
- model: transformer-align
- source language(s): eng
- target language(s): akl ceb hil ilo pag war
- model: transformer-align
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - valid language labels: >>agk<< >>agz<< >>akl<< >>akl_Latn<< >>atd<< >>atl<< >>bcl<< >>bgs<< >>bkd<< >>blf<< >>bln<< >>bno<< >>bnq<< >>bto<< >>ceb<< >>cgc<< >>cps<< >>fbl<< >>fil<< >>gor<< >>hil<< >>ibg<< >>ibl<< >>ify<< >>ilk<< >>ilo<< >>kak<< >>krj<< >>kyj<< >>kyk<< >>lbl<< >>loc<< >>mba<< >>mbb<< >>mbd<< >>mbi<< >>mbs<< >>mbt<< >>mdh<< >>mkx<< >>mog<< >>mqk<< >>mrw<< >>msb<< >>msm<< >>mta<< >>obo<< >>pag<< >>pam<< >>rbl<< >>rth<< >>sgd<< >>snl<< >>sxn<< >>tbl<< >>tdn<< >>tne<< >>tnt<< >>tnw<< >>tom<< >>txs<< >>ubl<< >>war<<
- download: opus1m+bt-2021-04-10.zip
- test set translations: opus1m+bt-2021-04-10.test.txt
- test set scores: opus1m+bt-2021-04-10.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test.eng-akl | 2.2 | 0.199 | 27 | 96 | 1.000 |
Tatoeba-test.eng-ceb | 10.8 | 0.429 | 378 | 2086 | 1.000 |
Tatoeba-test.eng-hil | 18.3 | 0.513 | 22 | 125 | 1.000 |
Tatoeba-test.eng-ilo | 33.4 | 0.586 | 1093 | 7241 | 1.000 |
Tatoeba-test.eng-multi | 19.4 | 0.490 | 3081 | 20897 | 1.000 |
Tatoeba-test.eng-pag | 16.3 | 0.504 | 49 | 320 | 1.000 |
Tatoeba-test.eng-war | 12.9 | 0.431 | 1512 | 11024 | 1.000 |
- dataset: opus4m+btTCv20210807
- model: transformer
- source language(s): eng
- target language(s): akl bcl ceb fil gor hil ibg ilo pag pam sxn war
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
>>id<<
(id = valid target language ID) - valid language labels: >>agk<< >>agz<< >>akl<< >>akl_Latn<< >>atd<< >>atl<< >>bcl<< >>bgs<< >>bkd<< >>blf<< >>bln<< >>bno<< >>bnq<< >>bto<< >>ceb<< >>cgc<< >>cps<< >>fbl<< >>fil<< >>gor<< >>hil<< >>ibg<< >>ibl<< >>ify<< >>ilk<< >>ilo<< >>kak<< >>krj<< >>kyj<< >>kyk<< >>lbl<< >>loc<< >>mba<< >>mbb<< >>mbd<< >>mbi<< >>mbs<< >>mbt<< >>mdh<< >>mkx<< >>mog<< >>mqk<< >>mrw<< >>msb<< >>msm<< >>mta<< >>obo<< >>pag<< >>pam<< >>rbl<< >>rth<< >>sgd<< >>snl<< >>sxn<< >>tbl<< >>tdn<< >>tne<< >>tnt<< >>tnw<< >>tom<< >>txs<< >>ubl<< >>war<<
- download: opus4m+btTCv20210807-2021-09-30.zip
- test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
- test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
---|---|---|---|---|---|
Tatoeba-test-v2021-08-07.eng-multi | 12.4 | 0.355 | 4081 | 26965 | 1.000 |
Tatoeba-test-v2021-08-07.multi-multi | 12.4 | 0.355 | 4081 | 26965 | 1.000 |