Skip to content

Latest commit

 

History

History

eng-phi

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): akl_Latn ceb hil ilo pag pmn war
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-akl.eng.akl 6.2 0.245
Tatoeba-test.eng-ceb.eng.ceb 10.6 0.436
Tatoeba-test.eng-hil.eng.hil 17.1 0.490
Tatoeba-test.eng-ilo.eng.ilo 33.9 0.587
Tatoeba-test.eng.multi 13.6 0.392
Tatoeba-test.eng-pag.eng.pag 16.8 0.484
Tatoeba-test.eng-pmn.eng.pmn 0.5 0.163
Tatoeba-test.eng-war.eng.war 12.8 0.437

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): akl_Latn ceb hil ilo pag war
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-akl.eng.akl 3.0 0.190
Tatoeba-test.eng-ceb.eng.ceb 11.1 0.434
Tatoeba-test.eng-hil.eng.hil 18.5 0.511
Tatoeba-test.eng-ilo.eng.ilo 32.9 0.590
Tatoeba-test.eng.multi 12.8 0.391
Tatoeba-test.eng-pag.eng.pag 18.5 0.505
Tatoeba-test.eng-war.eng.war 12.5 0.437

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): akl_Latn ceb hil ilo pag war
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-akl.eng.akl 7.1 0.245
Tatoeba-test.eng-ceb.eng.ceb 10.5 0.435
Tatoeba-test.eng-hil.eng.hil 18.0 0.506
Tatoeba-test.eng-ilo.eng.ilo 33.4 0.590
Tatoeba-test.eng.multi 13.1 0.392
Tatoeba-test.eng-pag.eng.pag 19.4 0.481
Tatoeba-test.eng-war.eng.war 12.8 0.441

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): akl ceb hil ilo pag war
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>agk<< >>agz<< >>akl<< >>akl_Latn<< >>atd<< >>atl<< >>bcl<< >>bgs<< >>bkd<< >>blf<< >>bln<< >>bno<< >>bnq<< >>bto<< >>ceb<< >>cgc<< >>cps<< >>fbl<< >>fil<< >>gor<< >>hil<< >>ibg<< >>ibl<< >>ify<< >>ilk<< >>ilo<< >>kak<< >>krj<< >>kyj<< >>kyk<< >>lbl<< >>loc<< >>mba<< >>mbb<< >>mbd<< >>mbi<< >>mbs<< >>mbt<< >>mdh<< >>mkx<< >>mog<< >>mqk<< >>mrw<< >>msb<< >>msm<< >>mta<< >>obo<< >>pag<< >>pam<< >>rbl<< >>rth<< >>sgd<< >>snl<< >>sxn<< >>tbl<< >>tdn<< >>tne<< >>tnt<< >>tnw<< >>tom<< >>txs<< >>ubl<< >>war<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.eng-akl 2.2 0.199 27 96 1.000
Tatoeba-test.eng-ceb 10.8 0.429 378 2086 1.000
Tatoeba-test.eng-hil 18.3 0.513 22 125 1.000
Tatoeba-test.eng-ilo 33.4 0.586 1093 7241 1.000
Tatoeba-test.eng-multi 19.4 0.490 3081 20897 1.000
Tatoeba-test.eng-pag 16.3 0.504 49 320 1.000
Tatoeba-test.eng-war 12.9 0.431 1512 11024 1.000

opus4m+btTCv20210807-2021-09-30.zip

  • dataset: opus4m+btTCv20210807
  • model: transformer
  • source language(s): eng
  • target language(s): akl bcl ceb fil gor hil ibg ilo pag pam sxn war
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>agk<< >>agz<< >>akl<< >>akl_Latn<< >>atd<< >>atl<< >>bcl<< >>bgs<< >>bkd<< >>blf<< >>bln<< >>bno<< >>bnq<< >>bto<< >>ceb<< >>cgc<< >>cps<< >>fbl<< >>fil<< >>gor<< >>hil<< >>ibg<< >>ibl<< >>ify<< >>ilk<< >>ilo<< >>kak<< >>krj<< >>kyj<< >>kyk<< >>lbl<< >>loc<< >>mba<< >>mbb<< >>mbd<< >>mbi<< >>mbs<< >>mbt<< >>mdh<< >>mkx<< >>mog<< >>mqk<< >>mrw<< >>msb<< >>msm<< >>mta<< >>obo<< >>pag<< >>pam<< >>rbl<< >>rth<< >>sgd<< >>snl<< >>sxn<< >>tbl<< >>tdn<< >>tne<< >>tnt<< >>tnw<< >>tom<< >>txs<< >>ubl<< >>war<<
  • download: opus4m+btTCv20210807-2021-09-30.zip
  • test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
  • test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test-v2021-08-07.eng-multi 12.4 0.355 4081 26965 1.000
Tatoeba-test-v2021-08-07.multi-multi 12.4 0.355 4081 26965 1.000