See http://www.confluent.io/stream-processing-cookbook/ksql-recipes/data-masking
-
Docker
-
If running on Mac/Windows, at least 4GB allocated to Docker:
docker system info | grep Memory
Should return a value greater than 8GB - if not, the Kafka stack will probably not work.
-
Clone this repository
git clone https://github.com/confluentinc/ksql-recipes-try-it-at-home.git
-
Launch:
cd ksql-recipes-try-it-at-home/data-masking docker-compose up -d
-
Run KSQL CLI:
docker-compose exec ksql-cli ksql http://ksql-server:8088
-
Register the existing
purchases
topic for use as a KSQL Stream calledpurchases
:CREATE STREAM purchases \ (order_id INT, customer_name VARCHAR, date_of_birth VARCHAR, \ product VARCHAR, order_total_usd VARCHAR, town VARCHAR, country VARCHAR) \ WITH (KAFKA_TOPIC='purchases', VALUE_FORMAT='JSON');
-
Inspect the first few messages as they arrive:
SELECT * FROM PURCHASES LIMIT 5;
-
Create a new stream (populating a Kafka topic) that drops the PII fields:
CREATE STREAM PURCHASES_NO_PII AS \ SELECT ORDER_ID, PRODUCT, ORDER_TOTAL_USD, TOWN, COUNTRY \ FROM PURCHASES;
-
Create a new stream (populating a Kafka topic) that masks the PII fields:
CREATE STREAM PURCHASES_MASKED_PII AS \ SELECT MASK(CUSTOMER_NAME) AS CUSTOMER_NAME, \ MASK_RIGHT(DATE_OF_BIRTH,12) AS DATE_OF_BIRTH, \ ORDER_ID, PRODUCT, ORDER_TOTAL_USD, TOWN, COUNTRY \ FROM PURCHASES;