-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to convert models with two vocab files to PyTorch? #22
Comments
This model uses two separate vocabularies and it does not properly convert to pytorch and huggingface at the moment. Hopefully, this will be added soon to the conversion procedures. |
Thanks @jorgtied Thanks |
The latest conversion scripts in the transformer library support the conversion of models with two vocabs. You may also check my recipes in https://github.com/Helsinki-NLP/Opus-MT/tree/master/hf |
I removed that model because it was so poor (at least according to the scores). I should create new ones for this language pair. |
Hi, I still got the same error I used the script from transformers, i.e., I also tried the convert_to_pytorch.py script you suggested, same error. Can you show me the command to convert such two vocab model pytorch? Thanks |
More resources on these split vocab models would be helpful. I'm also trying to compile these to CTranslate2 and having difficulties due to the split vocabs. |
Hi,
I would like to get translation result from the eng-kor model with
transformers.MarianMTModel
andtransformers.MarianTokenizer
. I understand we need to first convert the model to PyTorch format with convert_marian_tatoeba_to_pytorch.py first.The eng-kor has two different vocab sets for encoder and decoder. How can we use
transformers.models.marian.convert_marian_to_pytorch.convert
function to do the conversion?Because there is no
vocab.yml file
in the zip file, I found the line 381 throwsIndexError: list index out of range
error.Thanks
The text was updated successfully, but these errors were encountered: