- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): kha khm khm_Latn mnw vie vie_Hani
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
(id = valid target language ID) - download: opus-2020-06-28.zip
- test set translations: opus-2020-06-28.test.txt
- test set scores: opus-2020-06-28.eval.txt
testset | BLEU | chr-F |
Tatoeba-test.eng-kha.eng.kha | 0.4 | 0.054 |
Tatoeba-test.eng-khm.eng.khm | 0.2 | 0.240 |
Tatoeba-test.eng-mnw.eng.mnw | 0.9 | 0.003 |
Tatoeba-test.eng.multi | 20.1 | 0.354 |
Tatoeba-test.eng-vie.eng.vie | 33.6 | 0.512 |
- dataset: opus
- model: transformer
- source language(s): eng
- target language(s): kha khm khm_Latn mnw vie vie_Hani
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
(id = valid target language ID) - download: opus-2020-07-27.zip
- test set translations: opus-2020-07-27.test.txt
- test set scores: opus-2020-07-27.eval.txt
testset | BLEU | chr-F |
Tatoeba-test.eng-kha.eng.kha | 0.1 | 0.015 |
Tatoeba-test.eng-khm.eng.khm | 0.2 | 0.226 |
Tatoeba-test.eng-mnw.eng.mnw | 0.7 | 0.003 |
Tatoeba-test.eng.multi | 16.5 | 0.330 |
Tatoeba-test.eng-vie.eng.vie | 33.7 | 0.513 |
- dataset: opus1m+bt
- model: transformer-align
- source language(s): eng
- target language(s): kha khm mnw ngt vie
- model: transformer-align
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
(id = valid target language ID) - valid language labels: >>aem<< >>alk<< >>aml<< >>bbh<< >>bdq<< >>bgk<< >>bgl<< >>blr<< >>brb<< >>bru<< >>brv<< >>btq<< >>caq<< >>cbn<< >>cma<< >>cmo<< >>cog<< >>crv<< >>crw<< >>cua<< >>cwg<< >>dnu<< >>hal<< >>hld<< >>hnu<< >>hre<< >>huo<< >>jah<< >>jeh<< >>jhi<< >>kdt<< >>kha<< >>khf<< >>khm<< >>khm_Latn<< >>kjg<< >>kjm<< >>knq<< >>kns<< >>kpm<< >>krr<< >>krv<< >>kta<< >>ktv<< >>kuf<< >>kxm<< >>kxy<< >>lbn<< >>lbo<< >>lcp<< >>lnh<< >>lwl<< >>lyg<< >>mef<< >>mhe<< >>mlf<< >>mml<< >>mng<< >>mnn<< >>mnq<< >>mnw<< >>moo<< >>mqt<< >>mra<< >>mtq<< >>mzt<< >>ncb<< >>ncq<< >>nev<< >>ngt<< >>ngt_Latn<< >>nik<< >>nuo<< >>nyl<< >>omx<< >>oog<< >>oyb<< >>pac<< >>pbv<< >>pcb<< >>pce<< >>phg<< >>pkt<< >>pll<< >>ply<< >>pnx<< >>prk<< >>prt<< >>pry<< >>puo<< >>qok<< >>rbb<< >>ren<< >>ril<< >>rka<< >>rmx<< >>sbo<< >>scb<< >>scq<< >>sct<< >>sea<< >>sed<< >>sii<< >>smu<< >>spu<< >>sqq<< >>ssm<< >>sss<< >>stg<< >>sti<< >>stt<< >>stu<< >>syo<< >>sza<< >>szc<< >>tdf<< >>tdr<< >>tea<< >>tef<< >>thm<< >>tkz<< >>tlq<< >>tmo<< >>tnz<< >>tou<< >>tpu<< >>tth<< >>tto<< >>tyh<< >>uuu<< >>vie<< >>vie_Hani<< >>vwa<< >>wbm<< >>xao<< >>xkk<< >>xnh<< >>yin<< >>zng<<
- download: opus1m+bt-2021-04-10.zip
- test set translations: opus1m+bt-2021-04-10.test.txt
- test set scores: opus1m+bt-2021-04-10.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
Tatoeba-test.eng-kha | 0.6 | 0.088 | 1314 | 9269 | 1.000 |
Tatoeba-test.eng-khm | 0.0 | 0.013 | 752 | 1737 | 1.000 |
Tatoeba-test.eng-khm_Latn | 0.8 | 0.065 | 11 | 91 | 1.000 |
Tatoeba-test.eng-mnw | 0.6 | 0.001 | 9 | 44 | 1.000 |
Tatoeba-test.eng-multi | 21.5 | 0.339 | 4592 | 35578 | 1.000 |
Tatoeba-test.eng-ngt | 0.2 | 0.033 | 17 | 101 | 1.000 |
Tatoeba-test.eng-vie | 34.0 | 0.514 | 2500 | 24426 | 0.972 |
Tatoeba-test.eng-vie_Hani | 2.1 | 0.000 | 1 | 1 | 1.000 |
tico19-test.eng-khm | 0.6 | 0.029 | 2100 | 20941 | 1.000 |
- dataset: opus4m+btTCv20210807
- model: transformer
- source language(s): eng
- target language(s): kha khm mnw ngt vie
- model: transformer
- pre-processing: normalization + SentencePiece (spm32k,spm32k)
- a sentence initial language token is required in the form of
(id = valid target language ID) - valid language labels: >>aem<< >>alk<< >>aml<< >>bbh<< >>bdq<< >>bgk<< >>bgl<< >>blr<< >>brb<< >>bru<< >>brv<< >>btq<< >>caq<< >>cbn<< >>cma<< >>cmo<< >>cog<< >>crv<< >>crw<< >>cua<< >>cwg<< >>dnu<< >>hal<< >>hld<< >>hnu<< >>hre<< >>huo<< >>jah<< >>jeh<< >>jhi<< >>kdt<< >>kha<< >>khf<< >>khm<< >>khm_Latn<< >>kjg<< >>kjm<< >>knq<< >>kns<< >>kpm<< >>krr<< >>krv<< >>kta<< >>ktv<< >>kuf<< >>kxm<< >>kxy<< >>lbn<< >>lbo<< >>lcp<< >>lnh<< >>lwl<< >>lyg<< >>mef<< >>mhe<< >>mlf<< >>mml<< >>mng<< >>mnn<< >>mnq<< >>mnw<< >>moo<< >>mqt<< >>mra<< >>mtq<< >>mzt<< >>ncb<< >>ncq<< >>nev<< >>ngt<< >>ngt_Latn<< >>nik<< >>nuo<< >>nyl<< >>omx<< >>oog<< >>oyb<< >>pac<< >>pbv<< >>pcb<< >>pce<< >>phg<< >>pkt<< >>pll<< >>ply<< >>pnx<< >>prk<< >>prt<< >>pry<< >>puo<< >>qok<< >>rbb<< >>ren<< >>ril<< >>rka<< >>rmx<< >>sbo<< >>scb<< >>scq<< >>sct<< >>sea<< >>sed<< >>sii<< >>smu<< >>spu<< >>sqq<< >>ssm<< >>sss<< >>stg<< >>sti<< >>stt<< >>stu<< >>syo<< >>sza<< >>szc<< >>tdf<< >>tdr<< >>tea<< >>tef<< >>thm<< >>tkz<< >>tlq<< >>tmo<< >>tnz<< >>tou<< >>tpu<< >>tth<< >>tto<< >>tyh<< >>uuu<< >>vie<< >>vie_Hani<< >>vwa<< >>wbm<< >>xao<< >>xkk<< >>xnh<< >>yin<< >>zng<<
- download: opus4m+btTCv20210807-2021-09-30.zip
- test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
- test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt
testset | BLEU | chr-F | #sent | #words | BP |
Tatoeba-test-v2021-08-07.eng-multi | 20.9 | 0.347 | 4566 | 35533 | 1.000 |
Tatoeba-test-v2021-08-07.multi-multi | 20.9 | 0.347 | 4566 | 35533 | 1.000 |
tico19-test.eng-khm | 1.2 | 0.035 | 2100 | 20941 | 1.000 |