Reworked perform_text_replacement function #34

JnyJny · 2020-01-08T15:20:07Z

I initially wanted to make the keyword matching performed by this function, perform_text_replacement, a little more robust so that punctuation in the source text wouldn't interfere with matches. I then re-wrote the body of the function to simplify keyword matching in the style of 'Ask for Forgiveness' rather than 'Ask for Permission'.

To address the first problem (embedded punctuation) I did the following:

Imported the string 'punctuation' from the string module
Created a global scope dictionary _PUNCTUATION_TO_SPACE for use with the str.translate method
Modified the string normalization in perform_text_replacements to use casefold and translate instead of lower and strip.

I chose to replace punctuation in the source text with a space since there is a possibility of word concatenation if there are no spaces around the punctuation mark. The method str.casefold is preferred over str.lower when the goal is caseless string comparisons in that casefold handles unicode language variants more robustly than lower, Relevant StackOverflow.

The translation table dictionary, _PUNCTUATION_TO_SPACE, is a global since it is read-only and reconstructing it on every call is unnecessary.

When I inspected the remainder of the function, I found a loop which filtered the list of words for membership in the keys of the global scope dictionary TEXT_FILTER_REPLIES. If the resulting list was empty, the function returned None. If the list was not empty, the first item in the list is used to retrieve the value from the dictionary and a formatted string is returned. I found the logic to be correct but difficult to follow.

I restructured that code to loop thru each word in the input text and attempt to return the Easter egg phrase. When the word is missing from the dictionary, we handle the KeyError exception by continuing to the next word in the list. If we have exhausted the list, we return None as before. I'm reasonably certain this also improves the time-complexity of the function by reducing the number of times the words list is iterated, but TBH I haven't confirmed that empirically.

Finally, this code is untested but I have written a short standalone proof of concept to confirm that the changes I've proposed work in isolation from the rest of the bot framework.

I initially wanted to make the keyword matching performed by this function, `perform_text_replacement`, a little more robust so that punctuation in the source text wouldn't interfere with matches. I then re-wrote the body of the function to simplify keyword matching in the style of 'Ask for Forgiveness' rather than 'Ask for Permission'. To address the first problem (embedded punctuation) I did the following: 1. Imported the string 'punctuation' from the string module 2. Created a global scope dictionary `_PUNCTUATION_TO_SPACE` for use with the `str.translate` method 3. Modified the string normalization in `perform_text_replacements` to use `casefold` and `translate` instead of `lower` and `strip`. I chose to replace punctuation in the source text with a space since there is a possibility of word concatenation if there are no spaces around the punctuation mark. The method `str.casefold` is preferred over `str.lower` when the goal is caseless string comparisons in that `casefold` handles unicode language variants more robustly than `lower` (see https://stackoverflow.com/questions/45745661/python-lower-vs-casefold-in-string-matching-and-converting-to-lowercase/45745761). The translation table dictionary, _PUNCTUATION_TO_SPACE, is a global since it is read-only and reconstructing it on every call is unnecessary. When I inspected the remainder of the function, I found a loop which filtered the list of words for membership in the keys of the global scope dictionary `TEXT_FILTER_REPLIES`. If the resulting list was empty, the function returned None. If the list was not empty, the first item in the list is used to retrieve the value from the dictionary and a formatted string is returned. I found the logic to be correct but difficult to follow. I restructured that code to loop thru each word in the input text and attempt to return the Easter egg phrase. When the word is missing from the dictionary, we handle the `KeyError` exception by continuing to the next word in the list. If we have exhausted the list, we return None as before. I'm reasonably certain this also improves the time-complexity of the function by reducing the number of times the `words` list is iterated, but TBH I haven't confirmed that empirically.

pogross · 2020-01-29T12:54:08Z

Looks good! Nice work @JnyJny

Could you add some test cases? Even if trivial, better than nothing :)

Will take another look later and think about some tests too.

JnyJny · 2020-01-29T16:16:11Z

I'll take a look into adding tests.

pybites · 2020-01-31T06:59:00Z

bot/slack.py

@@ -203,20 +206,17 @@ def perform_bot_cmd(msg, private=True):

 def perform_text_replacements(text: str) -> Union[str, None]:
    """Replace first matching word in text with a little easter egg"""
-    words = text.lower().split()
-    strip_chars = "?!"
-    matching_words = [


this was redundant, nice concise solution @JnyJny

Do you remember what happened in Slack that triggered you to fix this? Would be nice to document it in a test. Thanks for improving this part of the code!

pogross · 2020-05-14T20:41:37Z

@pybites ping :)

JnyJny · 2020-05-14T20:55:06Z

Ok I forgot what I committed to. Sometimes work work gets in the way fun work.

pogross · 2020-09-22T20:36:49Z

This one got lost a bit and now seems to be dangling around. I will pick it with the next changes and tag @JnyJny in the PR.

pybites reviewed Jan 31, 2020

View reviewed changes

pogross linked an issue Sep 9, 2020 that may be closed by this pull request

Only pass command arguments as message to commands #44

Open

pogross removed a link to an issue Sep 12, 2020

Only pass command arguments as message to commands #44

Open

pogross added the Internal Changes not directly visible to the user: speed improvements, unit tests label Sep 25, 2020

pogross closed this Oct 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reworked perform_text_replacement function #34

Reworked perform_text_replacement function #34

JnyJny commented Jan 8, 2020

pogross commented Jan 29, 2020

JnyJny commented Jan 29, 2020

pybites Jan 31, 2020

pybites Jan 31, 2020 •

edited

Loading

pogross commented May 14, 2020

JnyJny commented May 14, 2020

pogross commented Sep 22, 2020

Reworked perform_text_replacement function #34

Reworked perform_text_replacement function #34

Conversation

JnyJny commented Jan 8, 2020

pogross commented Jan 29, 2020

JnyJny commented Jan 29, 2020

pybites Jan 31, 2020

Choose a reason for hiding this comment

pybites Jan 31, 2020 • edited Loading

Choose a reason for hiding this comment

pogross commented May 14, 2020

JnyJny commented May 14, 2020

pogross commented Sep 22, 2020

pybites Jan 31, 2020 •

edited

Loading