Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define GitHub actions workflows for building and testing dbt #50

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Aug 3, 2023

This PR sets up GitHub Actions workflows for two particular cases that we want to support when building and testing dbt:

  1. A fast build-and-test workflow that runs on pull requests and pushes to main branches (build_and_test_dbt)
    • This workflow will provide CI/CD when developing against this repo
    • Tests will be run against isolated databases that get torn down when the PR is merged
  2. A workflow that runs the full test suite on manual dispatch (test_dbt_models)

Closes #31.

@jeancochrane jeancochrane force-pushed the jeancochrane/31-data-catalog-define-github-actions-workflows-for-building-the-dbt-dag-and-running-tests branch from 4dbf505 to f833271 Compare August 3, 2023 15:37
@jeancochrane jeancochrane force-pushed the jeancochrane/31-data-catalog-define-github-actions-workflows-for-building-the-dbt-dag-and-running-tests branch from 5f43129 to 3ff6d98 Compare August 3, 2023 16:58
@jeancochrane jeancochrane force-pushed the jeancochrane/31-data-catalog-define-github-actions-workflows-for-building-the-dbt-dag-and-running-tests branch from 7be8890 to 567f63f Compare August 3, 2023 18:03
@jeancochrane jeancochrane force-pushed the jeancochrane/31-data-catalog-define-github-actions-workflows-for-building-the-dbt-dag-and-running-tests branch from 567f63f to 98a42ef Compare August 3, 2023 18:08
@jeancochrane
Copy link
Contributor Author

jeancochrane commented Aug 7, 2023

This is finally ready for a review! Probably best if both @dfsnow and @wrridgeway take a look, since the workflows introduced in this PR will operate as the runners of our data integrity tests.

Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall amazing work @jeancochrane! This couldn't have been easy to debug. Just a couple outstanding blockers and we're good to go.

.github/actions/load_environment_variables/action.yaml Outdated Show resolved Hide resolved
.github/scripts/cleanup_dbt_resources.sh Show resolved Hide resolved
.github/workflows/build_and_test_dbt.yaml Outdated Show resolved Hide resolved
Comment on lines +3 to +11
inputs:
dbt_project_dir:
description: Path to the directory containing the dbt project.
required: false
default: ./dbt
requirements_file_path:
description: Path to Python requirements file.
required: false
default: ./dbt/requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually didn't know you could re-use in-repo actions. I sillily used a composable workflow in PTAXSIM instead. Will fix: ccao-data/ptaxsim#15. Thanks for teaching me something today!

.github/workflows/build_and_test_dbt.yaml Outdated Show resolved Hide resolved
.github/workflows/build_and_test_dbt.yaml Outdated Show resolved Hide resolved
.github/workflows/test_dbt_models.yaml Show resolved Hide resolved
@@ -31,15 +31,15 @@ models:
- pin
- year
config:
error_if: ">280655"
error_if: ">280659"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): Some of these are likely just under-specified tests, but I agree it's going to be a problem. Grab @wrridgeway this week for a review/debugging session of the current tests. He knows more than anyone but Mirella about the horrors of our changing data.

@jeancochrane jeancochrane force-pushed the jeancochrane/31-data-catalog-define-github-actions-workflows-for-building-the-dbt-dag-and-running-tests branch from 4148bb1 to cef322a Compare August 8, 2023 15:13
This is necessary to support running test_dbt_models against the data-catalog branch,
which is our plan while we continue to develop on that long-lived branch.
@jeancochrane
Copy link
Contributor Author

Closing this temporarily to test the new cleanup-dbt-resources workflow!

@jeancochrane jeancochrane reopened this Aug 8, 2023
@jeancochrane jeancochrane requested a review from dfsnow August 8, 2023 16:32
@jeancochrane
Copy link
Contributor Author

This is ready for another look @dfsnow! I took just about all of your advice; the only thread with an open question is #50 (comment).

@jeancochrane
Copy link
Contributor Author

Actually, hold off on that review @dfsnow -- my testing on the docs branch (#57) is revealing that there may be a problem with the way we're setting things up for the data-catalog branch here.

@jeancochrane jeancochrane marked this pull request as draft August 8, 2023 17:11
…et HEAD_REF env var

It turns out that we can't override the GITHUB_HEAD_REF variable in a GitHub Actions workflow,
which means we can't set it in order to treat the data-catalog branch environment as a CI target.

To get around this, this commit changes up the generate_schema_name macro to read from a new
HEAD_REF env var instead of GITHUB_HEAD_REF, and then factors out a configure_dbt_environment
composite action that our dbt workflows can use to set this environment variable appropriately.
@jeancochrane jeancochrane marked this pull request as ready for review August 8, 2023 17:54
@jeancochrane
Copy link
Contributor Author

Alright @dfsnow, this is ready again! I made one change that wasn't part of our PR discussion after realizing that we can't override the GITHUB_ENV_VAR environment variable in order to run the ci target on the data-catalog branch. More context in the commit message for 60c2bb1:

It turns out that we can't override the GITHUB_HEAD_REF variable in a GitHub Actions workflow,
which means we can't set it in order to treat the data-catalog branch environment as a CI target.

To get around this, this commit changes up the generate_schema_name macro to read from a new
HEAD_REF env var instead of GITHUB_HEAD_REF, and then factors out a configure_dbt_environment
composite action that our dbt workflows can use to set this environment variable appropriately.

@dfsnow dfsnow self-requested a review August 8, 2023 18:03
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @jeancochrane, thanks for making those changes! Merge at your leisure

@jeancochrane jeancochrane merged commit 9574a38 into data-catalog Aug 8, 2023
@jeancochrane jeancochrane deleted the jeancochrane/31-data-catalog-define-github-actions-workflows-for-building-the-dbt-dag-and-running-tests branch August 8, 2023 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants