Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data-masking): add custom mask functionalities #5837

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

anafalcao
Copy link
Collaborator

@anafalcao anafalcao commented Jan 7, 2025

Issue number:
#5826

Summary

This PR enhances the data masking tool by introducing flexible masking options. These new features allow for dynamic, pattern-based, and regex-based masking, providing users with greater control over how sensitive data is obscured in using the erase method.

Changes

New flags for erase():

  • dynamic_mask (bool): Enables dynamic masking behavior when set to True, by maintaining the original length and structure of the text replacing with *.
    Example: dynamic_mask = True for 'Avenue St' is '****** **'
  • custom_mask (str): Specifies a simple pattern for masking data. This pattern is applied directly to the input string, replacing all the original characters.
    For example, with a mask_pattern of "XX-XX" applied to "12345", the result would be "XX-XX".
  • regex_pattern (str): Defines a regular expression pattern used to identify parts of the input string that should be masked. This allows for more complex and flexible masking rules. It's used in conjunction with mask_format.
  • mask_format (str): Specifies the format to use when replacing parts of the string matched by regex_pattern. It can include placeholders (like \1, \2) to refer to captured groups in the regex pattern, allowing some parts of the original string to be preserved.
    For example: '[email protected]' could become 'e*****@email.com'
  • masking_rules (dict): Apply different rules (formats) for each data field.

User experience

Previously, users had limited options for masking sensitive data. The erase() function provided basic masking capabilities, typically replacing entire fields or values with a fixed mask (e.g., '*****').
With the new masking options, users now have much more control over how their sensitive data is obscured. The enhanced erase() function offers a range of flexible masking techniques to suit various use cases, including different techniques for each field:

masking_rules = {
    "credit_card": {"custom_mask": "XX"},
    "street": {"dynamic_mask": True},
    "email": {"regex_pattern": r"(\w)[\w.-]+@([\w.-]+)", "mask_format": r"\1****@\2"}
}
masked_data = masker.erase(
    data={"credit_card": "1234-5678-9012-3456", "street": "Avenue St", "email": "[email protected]"},
    masking_rules=masking_rules
)
# Result: {"credit_card": "XX", "street": "****** **" , "email": "u****@example.com"}

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 7, 2025
@github-actions github-actions bot added the feature New feature or functionality label Jan 7, 2025
@boring-cyborg boring-cyborg bot added documentation Improvements or additions to documentation tests labels Jan 8, 2025
@anafalcao
Copy link
Collaborator Author

Hi @leandrodamascena ! Can I have your help here with mypy? Thanks!

@anafalcao anafalcao marked this pull request as ready for review January 13, 2025 18:09
@anafalcao anafalcao requested a review from a team as a code owner January 13, 2025 18:09
@anafalcao
Copy link
Collaborator Author

Hi @leandrodamascena!
I just converted to Ready for review. I've been having issues regarding Incompatible types in assignment with mypy. Can you take a look?
I also created some tests for the new functionalities, but I also may need to implement some more after fixing this types issues

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anafalcao! Another round of review. This is a nice work, we just need to fix some things. 🚀

aws_lambda_powertools/utilities/data_masking/base.py Outdated Show resolved Hide resolved
aws_lambda_powertools/utilities/data_masking/base.py Outdated Show resolved Hide resolved
aws_lambda_powertools/utilities/data_masking/base.py Outdated Show resolved Hide resolved
def erase(self, data: Sequence | Mapping, fields: list[str] | None = None) -> str | list[str] | tuple[str] | dict:
return self._apply_action(data=data, fields=fields, action=self.provider.erase)
@overload
def erase(self, data: dict[Any, Any], *, masking_rules: dict[str, object]) -> dict[Any, Any]: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need the arg * here. Can you try to remove this method signature and see if mypy complains?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to remove, but increases the number of mypy errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep this conversation opened until we find a solution for this.

docs/utilities/data_masking.md Show resolved Hide resolved
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 29, 2025
@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Jan 30, 2025
@pull-request-size pull-request-size bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2025
Copy link

@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or functionality size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tests
Projects
Status: Pending review
Development

Successfully merging this pull request may close these issues.

Feature request: support for custom masking with regx pattern or custom masking chars
2 participants