Skip to content

Commit

Permalink
Merge branch 'development'
Browse files Browse the repository at this point in the history
  • Loading branch information
hkage committed Jul 13, 2021
2 parents 87d1004 + 8d53643 commit 783de01
Show file tree
Hide file tree
Showing 17 changed files with 745 additions and 271 deletions.
4 changes: 4 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[report]
exclude_lines =
# Don't complain if tests don't hit defensive assertion code:
raise NotImplementedError
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,15 @@

## Development

## 0.6.0 (2021-07-13)

* [#28](https://github.com/rheinwerk-verlag/postgresql-anonymizer/pull/25): Add json support ([nurikk](https://github.com/nurikk))
* [#27](https://github.com/rheinwerk-verlag/postgresql-anonymizer/pull/25): Better anonymisation ([nurikk](https://github.com/nurikk))
* [#25](https://github.com/rheinwerk-verlag/postgresql-anonymizer/pull/25): Remove column specification for `cursor.copy_from` call ([nurikk](https://github.com/nurikk))

## 0.5.0 (2021-06-30)

* [#22](https://github.com/rheinwerk-verlag/postgresql-anonymizer/pull/22): Fix table and column name quotes in cursor.copy_from call ([nurikk](https://github.com/nurikk))
* [#22](https://github.com/rheinwerk-verlag/postgresql-anonymizer/pull/22): Fix table and column name quotes in `cursor.copy_from` call ([nurikk](https://github.com/nurikk))
* [#23](https://github.com/rheinwerk-verlag/postgresql-anonymizer/pull/23): Allow uniq faker ([nurikk](https://github.com/nurikk))

## 0.4.1 (2021-05-27)
Expand Down
49 changes: 32 additions & 17 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,30 @@ Features
* Exclude data for anonymization depending on regular expressions
* Truncate entire tables for unwanted data

+----------------+----------------------+-----------------------+----------------------------------+
| Field | Value | Provider | Output |
+================+======================+=======================+==================================+
| ``first_name`` | John | ``choice`` | (Bob|Larry|Lisa) |
+----------------+----------------------+-----------------------+----------------------------------+
| ``title`` | Dr. | ``clear`` | |
+----------------+----------------------+-----------------------+----------------------------------+
| ``street`` | Irving St | ``faker.street_name`` | Miller Station |
+----------------+----------------------+-----------------------+----------------------------------+
| ``password`` | dsf82hFxcM | ``mask`` | XXXXXXXXXX |
+----------------+----------------------+-----------------------+----------------------------------+
| ``email`` | [email protected] | ``md5`` | 0cba00ca3da1b283a57287bcceb17e35 |
+----------------+----------------------+-----------------------+----------------------------------+
| ``email`` | [email protected] | ``faker.unique.email``| [email protected] |
+----------------+----------------------+-----------------------+----------------------------------+
| ``ip`` | 157.50.1.20 | ``set`` | 127.0.0.1 |
+----------------+----------------------+-----------------------+----------------------------------+
+----------------+----------------------+------------------------+----------------------------------+
| Field | Value | Provider | Output |
+================+======================+========================+==================================+
| ``first_name`` | John | ``choice`` | (Bob|Larry|Lisa) |
+----------------+----------------------+------------------------+----------------------------------+
| ``title`` | Dr. | ``clear`` | |
+----------------+----------------------+------------------------+----------------------------------+
| ``street`` | Irving St | ``faker.street_name`` | Miller Station |
+----------------+----------------------+------------------------+----------------------------------+
| ``password`` | dsf82hFxcM | ``mask`` | XXXXXXXXXX |
+----------------+----------------------+------------------------+----------------------------------+
| ``email`` | [email protected] | ``md5`` | 0cba00ca3da1b283a57287bcceb17e35 |
+----------------+----------------------+------------------------+----------------------------------+
| ``email`` | [email protected] | ``faker.unique.email`` | [email protected] |
+----------------+----------------------+------------------------+----------------------------------+
| ``phone_num`` | 65923473 | ``md5``as_number: True | 3948293448 |
+----------------+----------------------+------------------------+----------------------------------+
| ``ip`` | 157.50.1.20 | ``set`` | 127.0.0.1 |
+----------------+----------------------+------------------------+----------------------------------+
| ``uuid_col`` | 00010203-0405-...... | ``uuid4`` | f7c1bd87-4d.... |
+----------------+----------------------+------------------------+----------------------------------+

Note: `faker.unique.[provider]` only supported on python3.5+ (Faker library min supported python version)
Note: `uuid4` - only for (native `uuid4<https://www.postgresql.org/docs/current/datatype-uuid.html>`) columns

See the `documentation`_ for a more detailed description of the provided anonymization methods.

Expand Down Expand Up @@ -75,11 +80,13 @@ Usage
--dry-run Don't commit changes made on the database
--dump-file DUMP_FILE
Create a database dump file with the given name
--init-sql INIT_SQL SQL to run before starting anonymization

Despite the database connection values, you will have to define a YAML schema file, that includes
all anonymization rules for that database. Take a look at the `schema documentation`_ or the
`YAML sample schema`_.


Example call::

$ pganonymize --schema=myschema.yml \
Expand All @@ -89,6 +96,14 @@ Example call::
--host=db.host.example.com \
-v

$ pganonymize --schema=myschema.yml \
--dbname=test_database \
--user=username \
--password=mysecret \
--host=db.host.example.com \
--init-sql "set search_path to non_public_search_path; set work_mem to '1GB';" \
-v

Database dump
~~~~~~~~~~~~~

Expand Down
6 changes: 4 additions & 2 deletions pganonymizer/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@


def main():
from pganonymizer.cli import main
from pganonymizer.cli import get_arg_parser, main

try:
main()
args = get_arg_parser().parse_args()
main(args)
exit_status = 0
except KeyboardInterrupt:
exit_status = 1
Expand Down
24 changes: 14 additions & 10 deletions pganonymizer/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

import argparse
import logging
import sys
import time

import yaml
Expand Down Expand Up @@ -33,8 +32,7 @@ def list_provider_classes():
print('{:<10} {}'.format(provider_cls.id, provider_cls.__doc__))


def main():
"""Main method"""
def get_arg_parser():
parser = argparse.ArgumentParser(description='Anonymize data of a PostgreSQL database')
parser.add_argument('-v', '--verbose', action='count', help='Increase verbosity')
parser.add_argument('-l', '--list-providers', action='store_true', help='Show a list of all available providers',
Expand All @@ -49,8 +47,13 @@ def main():
parser.add_argument('--dry-run', action='store_true', help='Don\'t commit changes made on the database',
default=False)
parser.add_argument('--dump-file', help='Create a database dump file with the given name')
parser.add_argument('--init-sql', help='SQL to run before starting anonymization', default=False)

return parser


args = parser.parse_args()
def main(args):
"""Main method"""

loglevel = logging.WARNING
if args.verbose:
Expand All @@ -59,16 +62,21 @@ def main():

if args.list_providers:
list_provider_classes()
sys.exit(0)
return 0

schema = yaml.load(open(args.schema), Loader=yaml.FullLoader)

pg_args = get_pg_args(args)
connection = get_connection(pg_args)
if args.init_sql:
cursor = connection.cursor()
logging.info('Executing initialisation sql {}'.format(args.init_sql))
cursor.execute(args.init_sql)
cursor.close()

start_time = time.time()
truncate_tables(connection, schema.get('truncate', []))
anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose)
anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose, dry_run=args.dry_run)

if not args.dry_run:
connection.commit()
Expand All @@ -79,7 +87,3 @@ def main():

if args.dump_file:
create_database_dump(args.dump_file, pg_args)


if __name__ == '__main__':
main()
5 changes: 1 addition & 4 deletions pganonymizer/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@
# Default name for the primary key column
DEFAULT_PRIMARY_KEY = 'id'

# Delimiter used to buffer and import database data.
COPY_DB_DELIMITER = '\x1f'

# Filename of the default schema
DEFAULT_SCHEMA_FILE = 'schema.yml'

# Default chunk size for data fetch
DEFAULT_CHUNK_SIZE = 2000
DEFAULT_CHUNK_SIZE = 100000
19 changes: 18 additions & 1 deletion pganonymizer/providers.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import operator
import random
from hashlib import md5
from uuid import uuid4

from faker import Faker
from six import with_metaclass
Expand Down Expand Up @@ -111,9 +112,16 @@ class MD5Provider(with_metaclass(ProviderMeta, Provider)):
"""Provider to hash a value with the md5 algorithm."""

id = 'md5'
default_max_length = 8

def alter_value(self, value):
return md5(value.encode('utf-8')).hexdigest()
as_number = self.kwargs.get('as_number', False)
as_number_length = self.kwargs.get('as_number_length', self.default_max_length)
hashed = md5(value.encode('utf-8')).hexdigest()
if as_number:
return int(hashed, 16) % (10 ** as_number_length)
else:
return hashed


class SetProvider(with_metaclass(ProviderMeta, Provider)):
Expand All @@ -123,3 +131,12 @@ class SetProvider(with_metaclass(ProviderMeta, Provider)):

def alter_value(self, value):
return self.kwargs.get('value')


class UUID4Provider(with_metaclass(ProviderMeta, Provider)):
"""Provider to set a random uuid value."""

id = 'uuid4'

def alter_value(self, value):
return uuid4()
Loading

0 comments on commit 783de01

Please sign in to comment.