Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having troubles reproducing results for m2m100 1.2b #3

Open
dchaplinsky opened this issue Dec 4, 2023 · 2 comments
Open

Having troubles reproducing results for m2m100 1.2b #3

dchaplinsky opened this issue Dec 4, 2023 · 2 comments

Comments

@dchaplinsky
Copy link

Hello @jorgtied!

I'm trying to reproduce the reported results for eng-ukr language pair for m2m100 on flores200 dataset but the score I get is much lower (26.8->21.0).

My setup is: cTranslate2, this model and HF's evaluate (the code is available here. The dataset is the same (Flores200, devtest).

My main suspects are:

  • Lower quality of the quantised m2m100 model
  • Different settings for the text generation (I'm using beams=5)
  • Different settings for BLEU scorer (ngrams, etc).

I've browsed the repos I found on opus-mt leaderboard and other seemingly relevant repos from Helsinki-NLP account. I also glimpsed through the main paper.

Could you please advise on the following things?

  • Where I can find generation/evaluation settings/code for the leaderboard?
  • Is there a file with the individual metrics per sentence pair?
  • Anything else you might remember or find relevant.

Thanks in advance!

@jorgtied
Copy link
Member

I used the native transformers library for decoding the testsets and beam size 1 (if I remember correctly). BLEU scores are computed with sacrebleu and default settings. There are no individual scores per sentence pair.

@dchaplinsky
Copy link
Author

Thanks. No source code left for the eval, so I can dig it myself rather than bothering you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants