=======================================================
This project is an easy-to-use development environment for Apache Airflow version 2. It can be run locally on a variety of OS platforms with simple steps for spinning up Airflow. The project can be integrated into an automated continuous integration/continuous delivery (CI/CD) process using GCP Cloud Build to build, test, and deploy workflows into GCP Cloud Composer. The project is meant to address several "infrastructure challenges" that Airflow developers experience and allows them to focus on workflow development rather than platform installation/configuration.
The environment is available for local use using Docker containers and Docker-Compose. For most of the deployment options, these are the only prerequisites required. The code has also been successfully tested within the GCP Cloud Shell/Editor which is an ephemeral cloud Linux instance accessible from a web browser. This may be beneficial for those who have "local PC restrictions" and cannot install docker-engine locally.
Main features of local development using Docker & Docker-Compose:
- Your workspace files are always synchronized with docker containers. With the use of an IDE program, the development process becomes easier and faster.
- Unit and Integration tests run within a container built from the same image as the Airflow deployment.
The project provides an opinionated Cloud Build CI/CD pipeline for the GCP Cloud Composer service. It natively integrates with a "local Airflow" development and allows developers to automatically stage, test and deploy their code into a production environment.
Main features of Cloud Build CI/CD pipeline for Composer environment:
- Container caching - reusing cache of already built images, speeds up the overall build process.
- Unit & Integration test as steps in CI stage.
- DAGs integrity validation (smoke test).
- Code linting check.
- Custom configuration: env variables, configuration, PyPi packages.
- Plugin and DAGs deployment into COmposer environment.
- Automatic email notification upon a successful build.
.
├── ci-cd # CI/CD deployment configuration
├── dags # Airflow DAGs
├── data # Airflow DATA
├── docker # Docker configuration
├── gcp-cloud-shell # Cloud Shell custom csripts
├── helpers # Backend scripts
├── logs # Airflow logs
├── plugins # Airflow plugins
├── tests # Tests
├── variables # Varibales for environments
├── .gitignore # Git's ignore process
├── pre-commit-config.yaml # Pre-commit hooks
├── LICENSE # Project license
├── README.md # Readme guidlines
└── docker-compose.yaml # Docker Compose deployemnt code
- OS:
MAC OS, Linux Ubuntu, GCP Cloud Shell
Note: Windows requires Windows Subsystem for Linux (WSL) - Code editing/Development environment:
Visual Studio Code (VS Code)
- Terminal client:
Visual Studio Code terminal
Note: Before working with your local development environment fork the repository, so you can have your own branch for development and custom changes.
Note: GCP Cloud Shell has several limitations. Everytime when a shell session is expired or closed, you have to re-run the Airflow initializaiton steps given in the section #4 (step 4.1)
-
2.1.1 Access GCP Cloud Shell from your browser using your credentials: https://ide.cloud.google.com
-
2.1.2 Open a terminal session
(Menu Terminal -- New Terminal)
and clone the repo, go to the directory:git clone <'Airflow 2 repository'> && cd airflow2-local
-
2.1.3 In the cloud shell UI, click on open folder and select the airflow2-local folder
-
2.1.4 Run the following commands to initialize the environment and install prerequisites:
chmod +x ./helpers/scripts/cloud-shell-init.sh && ./helpers/scripts/cloud-shell-init.sh
-
2.1.5 Proceed with the installation and initialization steps ( section #3 and #4 ).
-
2.2.1 Install the latest available version of
Docker
: https://docs.docker.com/get-docker/ -
2.2.2 Install the latest available version of
Docker compose
: https://docs.docker.com/compose/install/ -
2.2.3 Disable docker compose v2 experimental features via the CLI, run:
docker-compose disable-v2
-
2.2.5 Proceed with the installation and initialization steps ( section #3 and #4 )
-
2.3.1 Install the latest available version of
Docker Desktop
: https://docs.docker.com/get-docker/ -
2.3.2 Disable docker compose v2 experimental features via the CLI, run:
docker-compose disable-v2
-
2.3.3 Clone the repo:
git clone <'Airflow 2 repository'>
-
2.3.4 Launch Visual Studio Code and open the folder (Open folder) with the Airflow 2 code
-
2.3.5 Open a terminal window
(Menu Terminal -- New Terminal)
-
2.3.6 Proceed with the installation and initialization steps ( section #3 and #4 )
-
2.4.1 Install WSL (Windows Linux Subsystem): https://docs.microsoft.com/en-us/windows/wsl/install-win10
-
2.4.2 Install
Linux Ubuntu
distribution from the Microsoft Store: https://aka.ms/wslstore (this step is part of the previous step) -
2.4.3 Launch WLS Ubuntu and create a username (
airflow
) & password (airflow
) when prompted -
2.4.4 In the WSL terminal window go to
/home/airflow
and clone the repo:cd /home/airflow && git clone <'Airflow 2 repository'>
-
2.4.5 On Windows 10, install the latest available version of
Docker Desktop
: https://docs.docker.com/get-docker/ -
2.4.6 Once installed, launch
Docker Desktop
, go to Settings --> Resources --> WSL INTEGRATION and toggle "Ubuntu". Once done, click the "Apply & Restart" button -
2.4.7 Open a command line in Windows (CMD) and execute the following command to make sure that Ubuntu has been set as a default WSL:
wsl --setdefault Ubuntu
-
2.4.8 Install (if not already installed ) and launch Visual Studio Code
-
2.4.9 From the VS code extension tab, search and install a new plugin
Remote WLS
-
2.4.10 On Visual Studio Code , you now see a green WSL indicator in the bottom left corner, click on it and choose Open Folder in WSL . Windows will prompt you to select a folder, provide the follwing path to a folder:
\\wsl$\ubuntu\home\airflow
, and choose the folder with the Airflow code: ( airflo2-local-cicd ) -
2.4.11 Open a terminal session in VS code
(Menu Terminal -- New Terminal)
and run the WLS docker installation script:chmod +x ./helpers/scripts/docker-wls.sh && sudo ./helpers/scripts/docker-wls.sh
-
2.4.12 Proceed with the installation and initialization steps ( section #3 and #4 )
-
Add your Py dependencies to the
docker/requirements-airflow.txt
file. -
Adapt and install DAGs into the
dags
folder. -
Adapt and install Plugins into the
plugins
folder. -
Add variables to Airflow:
variables\docker-airflow-vars.json
file. -
Add variables to Docker containers' ENV:
variables\docker-env-vars
file. -
Add variables that contain secrets and API keys:
variables\docker-env-secrets
file, the file is added to the gitignore process. -
If there is a custom Airflow configuration file ready, uncomment the line in Dockerfile in order to include it in the image:
COPY airflow.cfg ${AIRFLOW_HOME}/airflow.cfg
. -
Optionally add the send_email.py dag to the
.airflowignore
file as this dag is only for the CI/CD part (to avoid warnings and errors during unit tests).
-
Set the projet-id variable in the
variables/docker-env-vars
orvariables/docker-env-secrets
file:GCP_PROJECT_ID='<project-id here>'
-
4.1 Open a terminal and run the following commands (you may need to use
sudo
before the command in some cases, such as: GCP Cloud Shell, Windows WSL, Cloud Linux VMs):./helpers/scripts/init_airflow.sh
Note: for GCP Cloud Shell you must re-run this command every time when a shell session is expired or ended.
-
4.2 Open a new terminal window and run the following command to make sure that all 3 containers (webserver, scheduler, postgres_db) are running and healthy:
docker ps
-
4.3 Authentificate for GCP services, run the following script and perform the gcp authentification:
./helpers/scripts/gcp-auth.sh
Note: NOT required if you are working via GCP Cloud Shell option, you can skip this step.
-
Airflow 2 is UP and Running!
-
To check if all 3 containers (webserver, scheduler, postgres_db) are running and healthy:
docker ps
-
To stop all Airflow containers (via a new terminal session):
docker-compose down
-
To start Airflow and all services:
docker-compose up
-
To rebuild containers (if changes are applied on Dockerfile or Docker-Compose):
docker-compose down docker-compose up --build
-
To cleaning up all containers and remove the database:
docker-compose down --volumes --rmi all
-
To run unit tests navigate to the
tests
directory and run the following command:./airflow "test command"
example:
cd tests && ./airflow "pytest tests/unit"
-
To run integration tests with GCP navigate to the
tests
directory and run the following command:./airflow "test command"
example:
./airflow "pytest --tc-file tests/integration/config.ini -v tests/integration"
-
To spin up an Ops container with Bash session:
./tests/airflow bash
-
To run an Airflow command within the environment, spin up an Ops container with a bash sessioin, then execute the command:
example:
airflow dags list
-
To launch a python session in Airflow:
./tests/airflow python
-
To access the Airflow Web UI:
localhost:8080
orWeb Preview
(GCP Cloud Shell)
-
7.1 Install pre-commit app:
-
For Linux/Windows
pip3 install pre-commit
-
For MAC-OS
brew install pre-commit
-
-
7.2 Run a pre-commit initialization command (inside the same dir where the code was cloned):
pre-commit install
-
7.3 Run pre-commit tests:
pre-commit run --all-files