This repository includes implementation of an ETL pipeline with Airflow and Docker. The pipelines are used to automatize the process of extracting data from various sources, transforming them, and loading the transformed data into a destination.
Links: Medium article
Install and run Docker.
Run the following to start microservices including Airflow and PostgreSQL:
docker compose up -d
Then, go to http://localhost:8080/ to access the Airflow UI.
The following DAGs are included in this repository:
etl_pipeline
: downlaods a publicly available CSV file from stats.govt.nz, transforms it by selecting a few features of the dataset, and loads the transformed data into a PostgreSQL database.