You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to use GitHub Actions to run the dbt run and dbt test commands that build and test our DAG, respectively. To do this, we'll need to add GitHub Actions workflow definitions to this repository.
Workflow definitions
Our workflow definition should support two different types of flows:
Rebuild all models that have changed since the last cached run, and run their tests
Run all the tests, regardless of the result of the last cached run
These two flows will be used in two different ways:
will provide continuous integration (CI) for dbt models in this repository, building and testing models when we make code changes to them; while
will provide a test interface that we can call from the GitHub Actions workflow API to run data integrity checks after pulling fresh source data from the system of record each night.
Caching is important in CI to help speed up development cycles, but it's unnecessary in the context of our nightly data integrity checks, where we want to validate all of the data on each run.
Cache behavior
The CI workflow (1 above) should exhibit the following cache behavior:
On every PR:
Run the build and tests for models that have changed since last commit to master OR since the last successful workflow run for this PR
In other words: The first workflow run for any PR should use the cache from the master branch, and subsequent runs should use the cache from the first successful run on the PR branch
These builds and tests should run in a separate development environment, ideally one that is created exclusively for the PR and not shared by other PRs; we should use the same environment scheme set up in [Data catalog] Add production profile to the dbt configuration #28
The master branch cache should never be updated by this flow
On commits to master:
Run builds and tests for models that have changed since the last commit to master
These builds and tests should run against the prod Athena environment
The master branch cache should be updated when this flow succeeds
AWS credentials
In order to run dbt commands against Athena from the context of a GitHub Action workflow, we'll need to inject valid AWS credentials into the workflow. Credentials should be stored as encrypted secrets and should have their permissions restricted as much as possible to reduce the attack surface of the credentials. See the dbt-athena docs for a list of the required permissions for the adapter.
Incremental testing
Since our CI tests will only be useful if they can distinguish between expected and unexpected data integrity issues, this issue depends on #32.
The text was updated successfully, but these errors were encountered:
Overview
We want to use GitHub Actions to run the
dbt run
anddbt test
commands that build and test our DAG, respectively. To do this, we'll need to add GitHub Actions workflow definitions to this repository.Workflow definitions
Our workflow definition should support two different types of flows:
These two flows will be used in two different ways:
Caching is important in CI to help speed up development cycles, but it's unnecessary in the context of our nightly data integrity checks, where we want to validate all of the data on each run.
Cache behavior
The CI workflow (1 above) should exhibit the following cache behavior:
master
OR since the last successful workflow run for this PRmaster
branch, and subsequent runs should use the cache from the first successful run on the PR branchmaster
branch cache should never be updated by this flowmaster
:master
prod
Athena environmentmaster
branch cache should be updated when this flow succeedsAWS credentials
In order to run dbt commands against Athena from the context of a GitHub Action workflow, we'll need to inject valid AWS credentials into the workflow. Credentials should be stored as encrypted secrets and should have their permissions restricted as much as possible to reduce the attack surface of the credentials. See the
dbt-athena
docs for a list of the required permissions for the adapter.Incremental testing
Since our CI tests will only be useful if they can distinguish between expected and unexpected data integrity issues, this issue depends on #32.
The text was updated successfully, but these errors were encountered: