-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exact Match #109
Comments
Hi @AlpUygur, Just add the bad words to the Keyword Processor using the add_keyword parameter, and make sure the case_sensitive=True. I hope this solves your issue?
Kind Regards, |
Hello, Thanks for your answer but it didn't work on my case. When I try to add words in for loop it says
and I did not want to add 694 of them in hand. |
can you share some sample which fails ? |
For example I am looking for "am" in text. It founds "am" when there is "cam" in the text. |
This should never happen, can you pick that line and make a working example
and share that.
…On Mon, May 11, 2020 at 12:20 AM Alp Uygur ***@***.***> wrote:
For example I am looking for "am" in text. It founds "am" when there is
"cam" in the text.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXY3QQPIMFJ45LKAB7T7ILRQ3ZN7ANCNFSM4MSBE2VA>
.
--
Vikash
|
@AlpUygur this does only happen when the "c" in "cam" for whatever reason is not part of the non_word_boundaries. Depending on the character script of your input text, this can happen. import string
non_word_boundaries = set(string.digits + string.ascii_letters + '_') Then check if that "c" is in non_word_boundaries. If it is not, you have to manually add non_word_boundaries to your instance of KeywordProcessor via |
Output: Text file and bad words are here. I looked to the file and there is no "am" word in it but it still finds it. There are "am" inside of some words. |
Can’t be bothered. Re-read my last comment and read up on how flashtext treats word boundaries. |
It did not change anything when I add non word boundary |
@AlpUygur Just my luck then I guess. :P keyword_processor = KeywordProcessor()
keyword_processor.add_keyword_from_file(r"badwords.txt", encoding="utf-8")
text = "akşamüzeri"
keyword_processor.extract_keywords(text)
>> ["am"] changing non word boundaries keyword_processor = KeywordProcessor()
keyword_processor.add_keyword_from_file(r"badwords.txt", encoding="utf-8")
keyword_processor.non_word_boundaries.update(["ş", "ü"])
text = "akşamüzeri"
keyword_processor.extract_keywords(text)
>> [] |
Thanks for implementation. It is very clear. |
Hi,
I am using flashtext for searching 694 bad words in some documents for tagging them if they contain bad language or not. But i need the exact match case because some words contain bad words in them but they are not bad words. How can I make the search for exact matches?
The text was updated successfully, but these errors were encountered: