KeyError: 'lemma' #48

Bachstelze · 2022-05-26T17:28:37Z

Following the code from https://trankit.readthedocs.io/en/latest/training.html#training-a-lemmatizer i get a KeyError: 'lemma':

Setting up training config...
Initialized lemmatizer trainer
Training dictionary-based lemmatizer

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

[<ipython-input-9-a90867cc5ef3>](https://localhost:8080/#) in <module>()
     11 
     12 # start training
---> 13 trainer.train()

3 frames

[/content/trankit/trankit/tpipeline.py](https://localhost:8080/#) in train(self)
    680             self._train_posdep()
    681         elif self._task == 'lemmatize':
--> 682             self._train_lemma()
    683         elif self._task == 'ner':
    684             self._train_ner()

[/content/trankit/trankit/tpipeline.py](https://localhost:8080/#) in _train_lemma(self)
    581 
    582     def _train_lemma(self):
--> 583         self._lemma_model.train()
    584 
    585     def _train_ner(self):

[/content/trankit/trankit/models/lemma_model.py](https://localhost:8080/#) in train(self)
    379             self.config.logger.info("Training dictionary-based lemmatizer")
    380             self.trainer.train_dict(
--> 381                 [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
    382                  not (
    383                          type(token[ID]) == tuple and len(token[ID]) == 2)])

[/content/trankit/trankit/models/lemma_model.py](https://localhost:8080/#) in <listcomp>(.0)
    381                 [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
    382                  not (
--> 383                          type(token[ID]) == tuple and len(token[ID]) == 2)])
    384             dev_preds = self.trainer.predict_dict(
    385                 [[token[TEXT], token[UPOS]] for sentence in self.dev_batch.doc for token in sentence if

KeyError: 'lemma'

The recent version from https://github.com/UniversalDependencies/UD_Thai-PUD is used as trainings and development data.

The text was updated successfully, but these errors were encountered:

Bachstelze · 2022-05-26T19:30:11Z

There are no Lemmas in the training data. So there can't be lemmatizer?! Can't i use the the other parts of the pipeline?
When i run

from trankit import Pipeline
p = Pipeline(lang='customized', cache_dir='./save_dir')

the following error occurs:

BadZipFile: File is not a zip file

gcelano · 2024-08-26T15:28:44Z

I get the same error when trying to train the lemmatizer:

Setting up training config...
Initialized lemmatizer trainer
Training dictionary-based lemmatizer
Traceback (most recent call last):
  File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/custom_train00.py", line 15, in <module>
    trainer.train()
  File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/tpipeline.py", line 683, in train
    self._train_lemma()
  File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/tpipeline.py", line 584, in _train_lemma
    self._lemma_model.train()
  File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/models/lemma_model.py", line 381, in train
    [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
  File "/home/celano/Documents/parser_Ancient_Greek_Latin/trankit-master-lemmatizer/trankit/models/lemma_model.py", line 381, in <listcomp>
    [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
KeyError: 'lemma'

GioDH18 · 2025-01-05T13:49:17Z

I am also getting this error, even though the .conllu file I am loading has the lemmas in the second column, as I think should be expected. Has anyone found a solution to this error? Is it a problem with the training data or Trankit itself?

GioDH18 · 2025-01-05T20:14:58Z

Never mind, it appears that the lemmatization pipeline has issues handling "_" in the lemma slot of conllus. I ended up just deleting these sentences from consideration. I don't know if that is the same issue others have faced, but I hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'lemma' #48

KeyError: 'lemma' #48

Bachstelze commented May 26, 2022

Bachstelze commented May 26, 2022

gcelano commented Aug 26, 2024

GioDH18 commented Jan 5, 2025

GioDH18 commented Jan 5, 2025

KeyError: 'lemma' #48

KeyError: 'lemma' #48

Comments

Bachstelze commented May 26, 2022

Bachstelze commented May 26, 2022

gcelano commented Aug 26, 2024

GioDH18 commented Jan 5, 2025

GioDH18 commented Jan 5, 2025