POS Tagging is Broken for Sliced Pipelines #13225
-
Hey everyone, I'm trying to lemmatize a text which I cleaned earlier. The issue I had was due to runtime so I decided to cut down certain pipelines out since I wanted lemmas only. When I only enable lemmas I got some warnings but I also wanted to filter based on POS tags such as ['ADJ', 'NOUN', 'VERB', 'ADV']. In order to generate .pos_ attribute, I enabled pipeline components for that which documentation said Thanks in advance! How to reproduce the behaviourHere is the code sample that doesn't work:
The one that works:
Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi! Sorry that this has been confusing. What you need, is to also ensure the
If you look at the
What this means, is that the You can read more about the shared tok2vec layer in our docs here: https://spacy.io/usage/embeddings-transformers#embedding-layers |
Beta Was this translation helpful? Give feedback.
-
Hey thanks for your answer. It wasnt clear in the doc which process has a dependecy on what. Perhaps better dependecy visualization help or certain error messages? If tagger needs toke2vec and user didn't enable it should throw error no? I ended up using Finally, I strongly believe showing how these components interact or use each other in the doc would really be nice. Moreover I believe code should in my case instead of providing nonsensical results. |
Beta Was this translation helpful? Give feedback.
Hi!
Sorry that this has been confusing. What you need, is to also ensure the
tok2vec
component is enabled:If you look at the
en_core_web_sm
package that's installed in yourvenv
, you can open theconfig.cfg
and find something like this:What this means, is that the
tagger
model uses thetok2vec
component in the pipeline - it "listens" to it to obtain word embeddings. Theparser
does, too. So you should make sure to enable them together.