Skip to content

Latest commit

 

History

History

eng-pqw

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): akl_Latn ceb cha dtp hil iba ilo ind jav jav_Java mad max_Latn min mlg pag pau sun tmw_Latn war zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-akl.eng.akl 3.4 0.135
Tatoeba-test.eng-ceb.eng.ceb 11.4 0.434
Tatoeba-test.eng-cha.eng.cha 1.6 0.187
Tatoeba-test.eng-dtp.eng.dtp 0.5 0.131
Tatoeba-test.eng-hil.eng.hil 17.3 0.518
Tatoeba-test.eng-iba.eng.iba 13.8 0.361
Tatoeba-test.eng-ilo.eng.ilo 33.3 0.588
Tatoeba-test.eng-jav.eng.jav 6.3 0.293
Tatoeba-test.eng-mad.eng.mad 1.3 0.145
Tatoeba-test.eng-mlg.eng.mlg 33.6 0.508
Tatoeba-test.eng-msa.eng.msa 30.9 0.558
Tatoeba-test.eng.multi 17.2 0.418
Tatoeba-test.eng-pag.eng.pag 16.5 0.485
Tatoeba-test.eng-pau.eng.pau 1.2 0.123
Tatoeba-test.eng-sun.eng.sun 35.1 0.447
Tatoeba-test.eng-war.eng.war 12.7 0.438

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): akl_Latn ceb cha dtp hil iba ilo ind jav jav_Java mad max_Latn min mlg pag pau sun tmw_Latn war zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-akl.eng.akl 3.0 0.143
Tatoeba-test.eng-ceb.eng.ceb 11.4 0.432
Tatoeba-test.eng-cha.eng.cha 1.4 0.189
Tatoeba-test.eng-dtp.eng.dtp 0.6 0.139
Tatoeba-test.eng-hil.eng.hil 17.7 0.525
Tatoeba-test.eng-iba.eng.iba 14.6 0.365
Tatoeba-test.eng-ilo.eng.ilo 34.0 0.590
Tatoeba-test.eng-jav.eng.jav 6.2 0.299
Tatoeba-test.eng-mad.eng.mad 2.6 0.154
Tatoeba-test.eng-mlg.eng.mlg 34.3 0.518
Tatoeba-test.eng-msa.eng.msa 31.1 0.561
Tatoeba-test.eng.multi 17.5 0.422
Tatoeba-test.eng-pag.eng.pag 19.8 0.507
Tatoeba-test.eng-pau.eng.pau 1.2 0.129
Tatoeba-test.eng-sun.eng.sun 30.3 0.418
Tatoeba-test.eng-war.eng.war 12.6 0.439

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): akl ceb cha hil iba ilo ind jak jav mad max min mlg msa pag pau plt sun tmw war zlm zsm
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>abl<< >>abs<< >>abx<< >>ace<< >>agk<< >>agz<< >>akb<< >>akl<< >>akl_Latn<< >>atd<< >>atl<< >>ban<< >>bbc<< >>bcl<< >>bdg<< >>bdl<< >>bdr<< >>beg<< >>bew<< >>bgs<< >>bjn<< >>bkd<< >>bkz<< >>blf<< >>bln<< >>bno<< >>bnq<< >>bsu<< >>btd<< >>bth<< >>btm<< >>bto<< >>bts<< >>btx<< >>btz<< >>buc<< >>bug<< >>ceb<< >>cgc<< >>cha<< >>cia<< >>cja<< >>cje<< >>cjm<< >>cps<< >>dbj<< >>drg<< >>dtr<< >>dun<< >>dup<< >>duq<< >>duw<< >>eno<< >>fbl<< >>fil<< >>gay<< >>goq<< >>gor<< >>hil<< >>hro<< >>huq<< >>iba<< >>ibg<< >>ibh<< >>ibl<< >>ify<< >>ilk<< >>ilo<< >>ind<< >>jak_Latn<< >>jav<< >>jav_Java<< >>jra<< >>kak<< >>kaw<< >>kge<< >>kjc<< >>kjk<< >>kqr<< >>krj<< >>ktq<< >>kvr<< >>kxd<< >>kyi<< >>kyj<< >>kyk<< >>kys<< >>lbl<< >>lbw<< >>lbx<< >>lce<< >>lcf<< >>ley<< >>liw<< >>ljp<< >>llk<< >>loc<< >>lra<< >>mad<< >>mak<< >>max_Latn<< >>mba<< >>mbb<< >>mbd<< >>mbi<< >>mbs<< >>mbt<< >>mdh<< >>mdr<< >>mfa<< >>mfb<< >>mhy<< >>min<< >>mkm<< >>mkx<< >>mlg<< >>mog<< >>mqk<< >>mqn<< >>mrw<< >>msa<< >>msa_Latn<< >>msb<< >>msm<< >>mta<< >>mtd<< >>mui<< >>mwv<< >>mxr<< >>myl<< >>mzq<< >>nia<< >>nij<< >>nrm<< >>obo<< >>otd<< >>pag<< >>pam<< >>pau<< >>pdo<< >>pel<< >>pku<< >>plt<< >>pse<< >>rad<< >>raz<< >>rbl<< >>ree<< >>rej<< >>rgs<< >>roc<< >>rog<< >>rth<< >>sas<< >>sda<< >>sdo<< >>sgd<< >>sjm<< >>skh<< >>slm<< >>sml<< >>smr<< >>smw<< >>sne<< >>snl<< >>snv<< >>ssb<< >>sse<< >>sun<< >>sxn<< >>sya<< >>tbl<< >>tdi<< >>tdn<< >>tdu<< >>tdx<< >>tjg<< >>tkg<< >>tlk<< >>tmw_Latn<< >>tne<< >>tnt<< >>tnw<< >>tom<< >>twy<< >>txs<< >>txy<< >>ubl<< >>ulu<< >>vkl<< >>vko<< >>war<< >>wow<< >>wru<< >>xkq<< >>xmv<< >>xmw<< >>xmz<< >>yka<< >>zbc<< >>zbe<< >>zbw<< >>zlm<< >>zlm_Latn<< >>zsm_Latn<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.eng-akl 2.1 0.122 27 96 1.000
Tatoeba-test.eng-ceb 10.2 0.422 378 2086 1.000
Tatoeba-test.eng-cha 2.2 0.212 237 1080 1.000
Tatoeba-test.eng-hil 14.3 0.476 22 125 1.000
Tatoeba-test.eng-iba 14.5 0.395 30 284 0.853
Tatoeba-test.eng-ilo 32.3 0.580 1093 7241 1.000
Tatoeba-test.eng-ind 35.9 0.618 4289 28294 0.962
Tatoeba-test.eng-jav 5.7 0.287 259 1615 1.000
Tatoeba-test.eng-jav_Java 5.9 0.000 3 3 1.000
Tatoeba-test.eng-mad 2.0 0.158 7 39 1.000
Tatoeba-test.eng-max_Latn 4.8 0.262 127 917 1.000
Tatoeba-test.eng-min 6.3 0.263 19 147 1.000
Tatoeba-test.eng-mlg 34.5 0.505 51 242 1.000
Tatoeba-test.eng-msa 32.0 0.579 5000 33629 0.989
Tatoeba-test.eng-multi 25.8 0.530 8725 58062 1.000
Tatoeba-test.eng-pag 15.8 0.504 49 320 1.000
Tatoeba-test.eng-pau 1.4 0.130 34 148 1.000
Tatoeba-test.eng-sun 36.9 0.438 26 122 1.000
Tatoeba-test.eng-tmw_Latn 2.9 0.171 5 23 1.000
Tatoeba-test.eng-war 12.1 0.420 1512 11024 1.000
Tatoeba-test.eng-zlm_Latn 3.7 0.280 24 163 1.000
Tatoeba-test.eng-zsm_Latn 12.7 0.392 536 4085 1.000