-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reworked perform_text_replacement function #34
Conversation
I initially wanted to make the keyword matching performed by this function, `perform_text_replacement`, a little more robust so that punctuation in the source text wouldn't interfere with matches. I then re-wrote the body of the function to simplify keyword matching in the style of 'Ask for Forgiveness' rather than 'Ask for Permission'. To address the first problem (embedded punctuation) I did the following: 1. Imported the string 'punctuation' from the string module 2. Created a global scope dictionary `_PUNCTUATION_TO_SPACE` for use with the `str.translate` method 3. Modified the string normalization in `perform_text_replacements` to use `casefold` and `translate` instead of `lower` and `strip`. I chose to replace punctuation in the source text with a space since there is a possibility of word concatenation if there are no spaces around the punctuation mark. The method `str.casefold` is preferred over `str.lower` when the goal is caseless string comparisons in that `casefold` handles unicode language variants more robustly than `lower` (see https://stackoverflow.com/questions/45745661/python-lower-vs-casefold-in-string-matching-and-converting-to-lowercase/45745761). The translation table dictionary, _PUNCTUATION_TO_SPACE, is a global since it is read-only and reconstructing it on every call is unnecessary. When I inspected the remainder of the function, I found a loop which filtered the list of words for membership in the keys of the global scope dictionary `TEXT_FILTER_REPLIES`. If the resulting list was empty, the function returned None. If the list was not empty, the first item in the list is used to retrieve the value from the dictionary and a formatted string is returned. I found the logic to be correct but difficult to follow. I restructured that code to loop thru each word in the input text and attempt to return the Easter egg phrase. When the word is missing from the dictionary, we handle the `KeyError` exception by continuing to the next word in the list. If we have exhausted the list, we return None as before. I'm reasonably certain this also improves the time-complexity of the function by reducing the number of times the `words` list is iterated, but TBH I haven't confirmed that empirically.
Looks good! Nice work @JnyJny Could you add some test cases? Even if trivial, better than nothing :) Will take another look later and think about some tests too. |
I'll take a look into adding tests. |
@@ -203,20 +206,17 @@ def perform_bot_cmd(msg, private=True): | |||
|
|||
def perform_text_replacements(text: str) -> Union[str, None]: | |||
"""Replace first matching word in text with a little easter egg""" | |||
words = text.lower().split() | |||
strip_chars = "?!" | |||
matching_words = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was redundant, nice concise solution @JnyJny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember what happened in Slack that triggered you to fix this? Would be nice to document it in a test. Thanks for improving this part of the code!
@pybites ping :) |
Ok I forgot what I committed to. Sometimes work work gets in the way fun work. |
This one got lost a bit and now seems to be dangling around. I will pick it with the next changes and tag @JnyJny in the PR. |
I initially wanted to make the keyword matching performed by this function,
perform_text_replacement
, a little more robust so that punctuation in the source text wouldn't interfere with matches. I then re-wrote the body of the function to simplify keyword matching in the style of 'Ask for Forgiveness' rather than 'Ask for Permission'.To address the first problem (embedded punctuation) I did the following:
_PUNCTUATION_TO_SPACE
for use with thestr.translate
methodperform_text_replacements
to usecasefold
andtranslate
instead oflower
andstrip
.I chose to replace punctuation in the source text with a space since there is a possibility of word concatenation if there are no spaces around the punctuation mark. The method
str.casefold
is preferred overstr.lower
when the goal is caseless string comparisons in thatcasefold
handles unicode language variants more robustly thanlower
, Relevant StackOverflow.The translation table dictionary,
_PUNCTUATION_TO_SPACE
, is a global since it is read-only and reconstructing it on every call is unnecessary.When I inspected the remainder of the function, I found a loop which filtered the list of words for membership in the keys of the global scope dictionary
TEXT_FILTER_REPLIES
. If the resulting list was empty, the function returned None. If the list was not empty, the first item in the list is used to retrieve the value from the dictionary and a formatted string is returned. I found the logic to be correct but difficult to follow.I restructured that code to loop thru each word in the input text and attempt to return the Easter egg phrase. When the word is missing from the dictionary, we handle the
KeyError
exception by continuing to the next word in the list. If we have exhausted the list, we return None as before. I'm reasonably certain this also improves the time-complexity of the function by reducing the number of times thewords
list is iterated, but TBH I haven't confirmed that empirically.Finally, this code is untested but I have written a short standalone proof of concept to confirm that the changes I've proposed work in isolation from the rest of the bot framework.