Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworked perform_text_replacement function #34

Closed
wants to merge 1 commit into from
Closed

Reworked perform_text_replacement function #34

wants to merge 1 commit into from

Conversation

JnyJny
Copy link
Contributor

@JnyJny JnyJny commented Jan 8, 2020

I initially wanted to make the keyword matching performed by this function, perform_text_replacement, a little more robust so that punctuation in the source text wouldn't interfere with matches. I then re-wrote the body of the function to simplify keyword matching in the style of 'Ask for Forgiveness' rather than 'Ask for Permission'.

To address the first problem (embedded punctuation) I did the following:

  1. Imported the string 'punctuation' from the string module
  2. Created a global scope dictionary _PUNCTUATION_TO_SPACE for use with the str.translate method
  3. Modified the string normalization in perform_text_replacements to use casefold and translate instead of lower and strip.

I chose to replace punctuation in the source text with a space since there is a possibility of word concatenation if there are no spaces around the punctuation mark. The method str.casefold is preferred over str.lower when the goal is caseless string comparisons in that casefold handles unicode language variants more robustly than lower, Relevant StackOverflow.

The translation table dictionary, _PUNCTUATION_TO_SPACE, is a global since it is read-only and reconstructing it on every call is unnecessary.

When I inspected the remainder of the function, I found a loop which filtered the list of words for membership in the keys of the global scope dictionary TEXT_FILTER_REPLIES. If the resulting list was empty, the function returned None. If the list was not empty, the first item in the list is used to retrieve the value from the dictionary and a formatted string is returned. I found the logic to be correct but difficult to follow.

I restructured that code to loop thru each word in the input text and attempt to return the Easter egg phrase. When the word is missing from the dictionary, we handle the KeyError exception by continuing to the next word in the list. If we have exhausted the list, we return None as before. I'm reasonably certain this also improves the time-complexity of the function by reducing the number of times the words list is iterated, but TBH I haven't confirmed that empirically.

Finally, this code is untested but I have written a short standalone proof of concept to confirm that the changes I've proposed work in isolation from the rest of the bot framework.

I initially wanted to make the keyword matching performed by this function, `perform_text_replacement`, a little more robust so that punctuation in the source text wouldn't interfere with matches. I then re-wrote the body of the function to simplify keyword matching in the style of 'Ask for Forgiveness' rather than 'Ask for Permission'.

To address the first problem (embedded punctuation) I did the following:
1. Imported the string 'punctuation' from the string module
2. Created a global scope dictionary  `_PUNCTUATION_TO_SPACE` for use with the `str.translate` method
3. Modified the string normalization in `perform_text_replacements` to use `casefold` and `translate` instead of `lower` and `strip`. 

I chose to replace punctuation in the source text with a space since there is a possibility of word concatenation if there are no spaces around the punctuation mark.  The method `str.casefold` is preferred over `str.lower` when the goal is caseless string comparisons in that `casefold` handles unicode language variants more robustly than `lower` (see https://stackoverflow.com/questions/45745661/python-lower-vs-casefold-in-string-matching-and-converting-to-lowercase/45745761). 

The translation table dictionary, _PUNCTUATION_TO_SPACE, is a global since it is read-only and reconstructing it on every call is unnecessary.

When I inspected the remainder of the function, I found a loop which filtered the list of words for membership in the keys of the global scope dictionary `TEXT_FILTER_REPLIES`. If the resulting list was empty, the function returned None.  If the list was not empty, the first item in the list is used to retrieve the value from the dictionary and a formatted string is returned. I found the logic to be correct but difficult to follow. 

I restructured that code to loop thru each word in the input text and attempt to return the Easter egg phrase. When the word is missing from the dictionary, we handle the `KeyError` exception by continuing to the next word in the list. If we have exhausted the list, we return None as before.  I'm reasonably certain this also improves the time-complexity of the function by reducing the number of times the `words` list is iterated, but TBH I haven't confirmed that empirically.
@pogross
Copy link
Member

pogross commented Jan 29, 2020

Looks good! Nice work @JnyJny

Could you add some test cases? Even if trivial, better than nothing :)

Will take another look later and think about some tests too.

@JnyJny
Copy link
Contributor Author

JnyJny commented Jan 29, 2020

I'll take a look into adding tests.

@@ -203,20 +206,17 @@ def perform_bot_cmd(msg, private=True):

def perform_text_replacements(text: str) -> Union[str, None]:
"""Replace first matching word in text with a little easter egg"""
words = text.lower().split()
strip_chars = "?!"
matching_words = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was redundant, nice concise solution @JnyJny

Copy link
Collaborator

@pybites pybites Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember what happened in Slack that triggered you to fix this? Would be nice to document it in a test. Thanks for improving this part of the code!

@pogross
Copy link
Member

pogross commented May 14, 2020

@pybites ping :)

@JnyJny
Copy link
Contributor Author

JnyJny commented May 14, 2020

Ok I forgot what I committed to. Sometimes work work gets in the way fun work.

@pogross
Copy link
Member

pogross commented Sep 22, 2020

This one got lost a bit and now seems to be dangling around. I will pick it with the next changes and tag @JnyJny in the PR.

@pogross pogross added the Internal Changes not directly visible to the user: speed improvements, unit tests label Sep 25, 2020
@pogross pogross closed this Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internal Changes not directly visible to the user: speed improvements, unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants