Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define GitHub actions workflows for building and testing dbt #50

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
88494fd
Define GitHub workflows for building and testing dbt models
jeancochrane Aug 2, 2023
997f3e4
Configure AWS credentials in GitHub Actions build and test workflows
jeancochrane Aug 2, 2023
bc0e000
Centralize dbt env vars in GitHub Actions workflows
jeancochrane Aug 2, 2023
ff4fd81
Tweaks to dbt GitHub Actions workflow definition ahead of testing
jeancochrane Aug 3, 2023
297f7ec
Fix linting problems with dbt GitHub actions and workflows
jeancochrane Aug 3, 2023
0396849
Empty commit to trigger CI build
jeancochrane Aug 3, 2023
f833271
Rename local GitHub actions to match expected dir/action.yaml naming …
jeancochrane Aug 3, 2023
3fb81f6
Add permissions to interact with GitHub OIDC to dbt actions
jeancochrane Aug 3, 2023
5b37bf8
Try new format for build and test cache key on CI
jeancochrane Aug 3, 2023
e8ec9d0
Strip extraneous dollar sign from dbt workflow cache key
jeancochrane Aug 3, 2023
f4ba9de
Try different format for dbt directory paths in workflow env vars
jeancochrane Aug 3, 2023
8530109
Rename dbt workflow env vars to avoid collisions with dbt internal en…
jeancochrane Aug 3, 2023
b26cefc
Make sure STATE_ARGS env var is never empty in build_and_test_dbt wor…
jeancochrane Aug 3, 2023
3ff6d98
Try new format for reading dbt commands from env vars in GitHub workflow
jeancochrane Aug 3, 2023
0607be7
Add step to build_and_test_dbt workflow to test dbt installation
jeancochrane Aug 3, 2023
98a42ef
Try a different quoting scheme for RUN_CMD and TEST_CMD in build_and_…
jeancochrane Aug 3, 2023
03e4dc1
Define build/test commands directly instead of via env vars in dbt wo…
jeancochrane Aug 3, 2023
b99d0da
Log all conditional branches in build_and_test_dbt_models workflow
jeancochrane Aug 3, 2023
829a456
See if removing hyphens from database names appeases dbt-athena on CI
jeancochrane Aug 3, 2023
cb0299b
Merge data-catalog into jeancochrane/31-data-catalog-define-github-ac…
jeancochrane Aug 3, 2023
31de4e0
Temporarily enable dbt debugging to try to figure out AWS permissions
jeancochrane Aug 3, 2023
7b0037a
Try reverting dbt schema naming back to kebab_slugify
jeancochrane Aug 4, 2023
9bf6cac
Remove --debug flag from dbt run call in build_and_test_dbt workflow
jeancochrane Aug 4, 2023
76db394
Bump error thresholds for four dbt tests
jeancochrane Aug 4, 2023
4d552aa
Add step to cleanup resources to build_and_test_dbt workflow
jeancochrane Aug 4, 2023
27f556c
Clean up cleanup_dbt_resources.sh script for use in CI
jeancochrane Aug 7, 2023
15a86d2
Bump allowed errors in dbt tests due to data problems
jeancochrane Aug 7, 2023
7ab4b69
Update build_and_test_dbt workflow to run when PRs are closed
jeancochrane Aug 7, 2023
9304b2f
Try apt-get instead of apt for installing jq in build_and_test_dbt wo…
jeancochrane Aug 7, 2023
b291800
Temporarily disable PR event restriction on dbt cleanup install step …
jeancochrane Aug 7, 2023
b4ec438
Try sudo apt-get for installing jq in build_and_test_dbt workflow
jeancochrane Aug 7, 2023
52fc11a
Remove installation step for dbt cleanup in build_and_test_dbt workflow
jeancochrane Aug 7, 2023
640387f
Enforce jq as a requirement for cleanup_dbt_resources.sh script
jeancochrane Aug 7, 2023
dfe8785
Fix path to cleanup_dbt_resources.sh on CI
jeancochrane Aug 7, 2023
6d685a0
Revert "Remove installation step for dbt cleanup in build_and_test_db…
jeancochrane Aug 7, 2023
1e5331d
Temporarily disable PR event restriction on cleanup in build_and_test…
jeancochrane Aug 7, 2023
620877b
Revert "Enforce jq as a requirement for cleanup_dbt_resources.sh script"
jeancochrane Aug 7, 2023
f2d0508
Revert "Temporarily disable PR event restriction on cleanup in build_…
jeancochrane Aug 7, 2023
cf5995b
Revert "Temporarily disable PR event restriction on dbt cleanup insta…
jeancochrane Aug 7, 2023
d34c83b
Temporarily run test_dbt_models workflow on PRs so we can dispatch it…
jeancochrane Aug 7, 2023
d2909ea
Give more verbose names to dbt workflow jobs
jeancochrane Aug 7, 2023
7538f9e
Revert "Temporarily run test_dbt_models workflow on PRs so we can dis…
jeancochrane Aug 7, 2023
c3a7a62
Try adding push to test_dbt_models workflow definition to test dispatch
jeancochrane Aug 7, 2023
e10540f
Revert "Try adding push to test_dbt_models workflow definition to tes…
jeancochrane Aug 7, 2023
b5cec8d
Add docstring to cleanup_dbt_resources.sh
jeancochrane Aug 7, 2023
68f914c
Run `dbt run` with --defer on CI to inherit built resources
jeancochrane Aug 7, 2023
1d4ec5a
Don't use build cache in test_dbt_models workflow
jeancochrane Aug 7, 2023
8af875c
Try adding push to test_dbt_models workflow definition to test it again
jeancochrane Aug 7, 2023
87353b2
Temporarily add --debug flag to dbt call in test_dbt_models
jeancochrane Aug 7, 2023
37ba0bb
Change `push` to `pull_request` for testing test_dbt_models workflow
jeancochrane Aug 7, 2023
b36fec7
Remove --debug flag from test_dbt_models workflow
jeancochrane Aug 7, 2023
5bdf9a2
Change test_dbt_models workflow to only run on dispatch
jeancochrane Aug 7, 2023
cef322a
Cache dbt and Python requirements in install_dbt_requirements action
jeancochrane Aug 8, 2023
e51c677
Use sed to strip comment lines in load_environment_variables composit…
jeancochrane Aug 8, 2023
f9b5d86
Kebab case dbt build and test workflow names
jeancochrane Aug 8, 2023
704ed13
Factor out cleanup-dbt-resources into its own workflow
jeancochrane Aug 8, 2023
1738374
Set GITHUB_HEAD_REF var in test_dbt_models workflow
jeancochrane Aug 8, 2023
672a7d4
Bump threshold for vw_pin_appeal dbt test failures
jeancochrane Aug 8, 2023
60c2bb1
Factor out composite GitHub action for configure_dbt_environment to s…
jeancochrane Aug 8, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/actions/configure_dbt_environment/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Configure dbt environment
description: Set environment variables based on the active dbt project (CI or prod)
runs:
using: composite
steps:
- name: Configure dbt environment
run: |
if [[ $GITHUB_REF_NAME == 'master' ]]; then
echo "On master branch, setting dbt env to prod"
{
echo "TARGET=prod";
echo "CACHE_KEY=master";
} >> "$GITHUB_ENV"
elif [[ $GITHUB_REF_NAME == 'data-catalog' ]]; then
echo "On data catalog branch, setting dbt env to CI"
{
echo "TARGET=ci";
echo "CACHE_KEY=data-catalog";
echo "HEAD_REF=data-catalog";
} >> "$GITHUB_ENV"
else
echo "On pull request branch, setting dbt env to CI"
{
echo "TARGET=ci";
echo "CACHE_KEY=$GITHUB_HEAD_REF";
echo "HEAD_REF=$GITHUB_HEAD_REF"
} >> "$GITHUB_ENV"
fi
shell: bash
34 changes: 34 additions & 0 deletions .github/actions/install_dbt_requirements/action.yaml
dfsnow marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Install dbt dependencies
description: Installs Python and dbt requirements for a workflow
inputs:
dbt_project_dir:
description: Path to the directory containing the dbt project.
required: false
default: ./dbt
requirements_file_path:
description: Path to Python requirements file.
required: false
default: ./dbt/requirements.txt
Comment on lines +3 to +11
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly necessary for us to factor out these variables, since we don't currently have plans to reuse this action outside of our current dbt project, but the GitHub Actions docs recommend using variables instead of hardcoded paths so I'm doing so here:

We strongly recommend that actions use variables to access the filesystem rather than using hardcoded file paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually didn't know you could re-use in-repo actions. I sillily used a composable workflow in PTAXSIM instead. Will fix: ccao-data/ptaxsim#15. Thanks for teaching me something today!

runs:
using: composite
steps:
- name: Setup python
uses: actions/setup-python@v4
with:
python-version: 3.x
cache: pip

- name: Install python requirements
run: python -m pip install -r ${{ inputs.requirements_file_path }}
shell: bash

- name: Cache dbt requirements
uses: actions/cache@v3
with:
path: ${{ inputs.dbt_project_dir }}/dbt_packages
key: dbt-${{ hashFiles(format('{0}/packages.yml', inputs.dbt_project_dir)) }}

- name: Install dbt requirements
run: dbt deps
working-directory: ${{ inputs.dbt_project_dir }}
shell: bash
16 changes: 16 additions & 0 deletions .github/actions/load_environment_variables/action.yaml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, both workflows share a common set of environment variables that are loaded using this action.

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Load environment variables
description: Configures environment variables for a workflow
inputs:
env_var_file_path:
description: |
File path to variable file or directory.
Defaults to ./.github/variables/* if none specified
and runs against each file in that directory.
required: false
default: ./.github/variables/*
runs:
using: composite
steps:
# Use sed to strip comment lines
- run: sed "/#/d" ${{ inputs.env_var_file_path }} >> "$GITHUB_ENV"
shell: bash
35 changes: 35 additions & 0 deletions .github/scripts/cleanup_dbt_resources.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env bash
# Clean up dbt resources created by a CI run or by local development.
#
# Takes one argument representing the target environment to clean up,
# one of `dev` or `ci`. E.g.:
#
# ./cleanup_dbt_resources.sh dev
#
# Assumes that jq is installed and available on the caller's path.
set -euo pipefail

if [[ "$#" -eq 0 ]]; then
echo "Missing first argument representing dbt target"
exit 1
fi

if [ "$1" == "prod" ]; then
echo "Target cannot be 'prod'"
exit 1
fi

schemas_json=$(dbt --quiet list --resource-type model --target "$1" \
--output json --output-keys schema) || (echo "Error in dbt call" && exit 1)
schemas=$(echo "$schemas_json"| sort | uniq | jq ' .schema') || (\
echo "Error in schema parsing" && exit 1
)

echo "Deleting the following schemas from Athena:"
echo
echo "$schemas"

echo "$schemas" | xargs -i bash -c 'aws glue delete-database --name {} || exit 255'
dfsnow marked this conversation as resolved.
Show resolved Hide resolved

echo
echo "Done!"
3 changes: 3 additions & 0 deletions .github/variables/dbt.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
CACHE_NAME=dbt-cache
MANIFEST_DIR=dbt/target
PROJECT_DIR=dbt
83 changes: 83 additions & 0 deletions .github/workflows/build_and_test_dbt.yaml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow represents the build and test that should run on every PR and push to our main branch.

Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
name: build-and-test-dbt

on:
pull_request:
branches: [master, data-catalog]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of references to data-catalog here that cover for the fact that we don't expect this workflow to be pulled into master immediately, and that we want to treat data-catalog as a CI environment until we're ready to merge data-catalog into master (#55). Part of that merge issue will be stripping out the references to data-catalog in these workflows and ensuring that they are ready to be run against master.

push:
branches: [master, data-catalog]

jobs:
build-and-test-dbt:
runs-on: ubuntu-latest
# These permissions are needed to interact with GitHub's OIDC Token endpoint
# so that we can authenticate with AWS
permissions:
id-token: write
contents: read
Comment on lines +12 to +16
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More details on these permissions settings here.

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install dbt requirements
uses: ./.github/actions/install_dbt_requirements

- name: Load environment variables
uses: ./.github/actions/load_environment_variables

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_IAM_ROLE_TO_ASSUME_ARN }}
aws-region: us-east-1

- name: Configure dbt environment
uses: ./.github/actions/configure_dbt_environment

- name: Cache dbt manifest
id: cache
uses: actions/cache@v3
with:
path: ${{ env.MANIFEST_DIR }}
key: ${{ env.CACHE_NAME }}-${{ env.CACHE_KEY }}
restore-keys: |
${{ env.CACHE_NAME }}-data-catalog
${{ env.CACHE_NAME }}-master
Comment on lines +42 to +44
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore-keys setting here means that we will fall back to the data-catalog or master branch caches if the PR branch cache does not exist (which we only ever expect to happen on the first pushes to a PR, before the PR has had a successful run).


- if: ${{ steps.cache.outputs.cache-hit == 'true' }}
name: Set command args to build/test modified resources
run: echo "MODIFIED_RESOURCES_ONLY=true" >> "$GITHUB_ENV"
shell: bash

- if: ${{ steps.cache.outputs.cache-hit != 'true' }}
name: Set command args to build/test all resources
run: echo "MODIFIED_RESOURCES_ONLY=false" >> "$GITHUB_ENV"
shell: bash

- name: Test dbt macros
run: dbt run-operation test_all
working-directory: ${{ env.PROJECT_DIR }}
shell: bash

- name: Build models
run: |
if [[ $MODIFIED_RESOURCES_ONLY == 'true' ]]; then
echo "Running build on modified resources only"
dbt run --target "$TARGET" -s state:modified --defer --state target/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things are going on here:

  1. --target is ensuring that we're running against the correct environment (CI or prod)
  2. -s state:modified is only selecting resources that have changed since the last run
  3. --defer is instructing dbt to reuse resources that are marked as created in the state file, even if they don't exist in the current environment
    1. For example, if we are on a pull request branch where we have restored the cache from the data-catalog branch, dbt will reuse resources created in the data-catalog environment without needing to recreate them in the PR environment (docs)
  4. --state instructs dbt on where to load the state file, which we keep cached

else
echo "Running build on all resources"
dbt run --target "$TARGET"
fi
working-directory: ${{ env.PROJECT_DIR }}
shell: bash

- name: Test models
run: |
if [[ $MODIFIED_RESOURCES_ONLY == 'true' ]]; then
echo "Running tests on modified resources only"
dbt test --target "$TARGET" -s state:modified --state target/
else
echo "Running tests on all resources"
dbt test --target "$TARGET"
fi
working-directory: ${{ env.PROJECT_DIR }}
shell: bash
42 changes: 42 additions & 0 deletions .github/workflows/cleanup_dbt_resources.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: cleanup-dbt-resources

on:
pull_request:
branches: [master, data-catalog]
types: [closed]

jobs:
cleanup-dbt-resources:
runs-on: ubuntu-latest
# These permissions are needed to interact with GitHub's OIDC Token endpoint
# so that we can authenticate with AWS
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install dbt requirements
uses: ./.github/actions/install_dbt_requirements

- name: Install requirements for cleaning up dbt resources
run: sudo apt-get update && sudo apt-get install jq
shell: bash

- name: Load environment variables
uses: ./.github/actions/load_environment_variables

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_IAM_ROLE_TO_ASSUME_ARN }}
aws-region: us-east-1

- name: Configure dbt environment
uses: ./.github/actions/configure_dbt_environment

- name: Clean up dbt resources
run: ../.github/scripts/cleanup_dbt_resources.sh ci
working-directory: ${{ env.PROJECT_DIR }}
shell: bash
40 changes: 40 additions & 0 deletions .github/workflows/test_dbt_models.yaml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow represents the tests that we will want to trigger from our nightly import process (ccao-data/service-sqoop-iasworld#1). It is not fully tested yet as part of this project, since manual workflow dispatch is not possible until the workflow has been pulled into the main branch (source). Instead, I tested the workflow by manually editing the on: key below to run the workflow on pushes to this pull request; test output here.

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: test-dbt-models

on: workflow_dispatch

jobs:
test-dbt-models:
runs-on: ubuntu-latest
# These permissions are needed to interact with GitHub's OIDC Token endpoint
# so that we can authenticate with AWS
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install dbt requirements
uses: ./.github/actions/install_dbt_requirements

- name: Load environment variables
uses: ./.github/actions/load_environment_variables

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_IAM_ROLE_TO_ASSUME_ARN }}
aws-region: us-east-1

- name: Configure dbt environment
uses: ./.github/actions/configure_dbt_environment

- name: Test models
# Target is currently set to CI because we expect this action to be
# run against the long-lived data-catalog branch, but we should change
# this to prod when we merge that branch into master
run: dbt test --target ci
dfsnow marked this conversation as resolved.
Show resolved Hide resolved
working-directory: ${{ env.PROJECT_DIR }}
shell: bash
env:
GITHUB_HEAD_REF: data-catalog
2 changes: 1 addition & 1 deletion dbt/macros/generate_schema_name.sql
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
{%- if target.name == "dev" -%}
{%- set schema_prefix = "dev_" ~ env_var_func("USER") ~ "_" -%}
{%- elif target.name == "ci" -%}
{%- set github_head_ref = kebab_slugify(env_var_func("GITHUB_HEAD_REF")) -%}
{%- set github_head_ref = kebab_slugify(env_var_func("HEAD_REF")) -%}
{%- set schema_prefix = "ci_" ~ github_head_ref ~ "_" -%}
{%- else -%} {%- set schema_prefix = "" -%}
{%- endif -%}
Expand Down
2 changes: 1 addition & 1 deletion dbt/macros/tests/test_generate_schema_name.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

{% macro mock_env_var(var_name) %}
{% if var_name == "USER" %} {{ return("testuser") }}
{% elif var_name == "GITHUB_HEAD_REF" %} {{ return("testuser/feature-branch-1") }}
{% elif var_name == "HEAD_REF" %} {{ return("testuser/feature-branch-1") }}
{% else %} {{ return("") }}
{% endif %}
{% endmacro %}
Expand Down
8 changes: 4 additions & 4 deletions dbt/models/default/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ models:
- pin
- year
config:
error_if: ">280655"
error_if: ">280662"
# Unique by case number and year
- unique_combination_of_columns:
name: vw_pin_appeal_unique_by_case_number_and_year
combination_of_columns:
- year
- case_no
config:
error_if: ">365779"
error_if: ">365894"
# `change` should be an enum
- dbt_utils.expression_is_true:
name: vw_pin_appeal_no_unexpected_change_values
Expand Down Expand Up @@ -85,7 +85,7 @@ models:
case when char_renovation = '1' then true else false end
)
config:
error_if: ">73925"
error_if: ">73941"
# TODO: Characteristics columns should adhere to pre-determined criteria
- name: vw_pin_address_test
description: '{{ doc("vw_pin_address_test") }}'
Expand All @@ -111,7 +111,7 @@ models:
- mail_address_zipcode_1
- mail_address_zipcode_2
config:
error_if: ">879261"
error_if: ">880581"
# TODO: Mailing address changes after validated sale(?)
# TODO: Site addresses are all in Cook County
- name: vw_pin_condo_char_test
Expand Down