Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule Based Tagger, how to assign semantic tags when no POS information and two entries exist for the same word #9

Open
apmoore1 opened this issue Nov 25, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@apmoore1
Copy link
Member

Problem

In the rule based tagger we have the rules as defined in the USASRuleBasedTagger class. When we have a lexicon lookup for a word/lemma/token, whereby either the word's POS tag does not exist in the lookup or no POS tag information is given, and the word appears twice in the lexicon lookup with different POS tags and different semantic tags what should the tagger do?

At the moment the tagger assign the semantic tags from the word that appears last in the lexicon lookup / lexicon TSV file. This only happens at the moment due to the way that Python creates a dictionary / hash map.

Below is an example of the problem:

Given the word sauf in French, within the USAS french semantic lexicon there are two entries for this word as it has two different possible POS types. At the moment if the Rule Based Tagger was given this word to tag without any POS information or a POS tag that is not in the lexicon then it would assign the tags [A1.8-, Z5] as those are the tags for the last sauf entry in the lexicon.

Solutions

  1. Keep it as it is.
  2. We assign all of the semantic tags for all entries for that word. If we use this solution then we need to think about the order of the tags e.g. which semantic tag should be the first in the list and therefore the most likely semantic tag? Further, we need to ensure we do not duplicate semantic tags.
@apmoore1 apmoore1 added the enhancement New feature or request label Nov 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants