How to mark identifiers like productid (AS12314_11) as entity ? #2553
-
If i have a query like
How will i be able to get XA12345 as an entity? Regards |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
spaCy's entity recognizer can be updated with more examples – either to detect more examples of an existing category (like
If your product names follow a consistent scheme (for example, two letters plus 5 numbers), a rule-based approach might actually be a better place to start. Instead of using a statistical model, you'd use the |
Beta Was this translation helpful? Give feedback.
-
I have tried that but it is not learning with ID's Training:import random
LABEL = 'ID' Testingtest the trained model test_text = 'Do you like ewr214124243_1?' |
Beta Was this translation helpful? Give feedback.
-
@vkgpt11 Well, the entities are recognised in context and you've given it only very few examples of completely different IDs, in the context of the original "horses" example. Also note that the code example is really just a simple example that shows how the training loop works. If you really want to train a new category from scratch, you'll need a lot more examples, in real contexts, as similar as possible to the original data you want to use the model on later. Here are some more details on training data: In your case, it might be better to start off with a rule-based approach first – especially if the IDs follow certain patterns that can be expressed via rules: |
Beta Was this translation helpful? Give feedback.
spaCy's entity recognizer can be updated with more examples – either to detect more examples of an existing category (like
PRODUCT
) or to add a new custom category. Here are some resources to get you started:If your product names follow a consistent scheme (for example, two letters plus 5 numbers), a rule-based approach might actually be a better place to start. Instead of using a statistical model, you'd use the
Matcher
to find entities in your text, and then add them to thedoc.ents
property. More info: