-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust dbt profiles and schema macro to support prod/CI targets #44
Adjust dbt profiles and schema macro to support prod/CI targets #44
Conversation
# Prefix all generated data by schema, so that we can delete it when the | ||
# PR is merged | ||
s3_data_naming: schema_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not totally sure if this works yet, since we're not yet using dbt to manage our CTAs, but the idea here is that eventually when dbt builds CTAs and stores them in S3 the schema_table
config will tell it to store those files in the configured s3_data_dir
bucket with the path {s3_data_dir}/{schema}/{table}/
(docs). That way when we're ready to have CD clean up the resources generated for the PR, we can use the schema name defined by the generate_schema_name
to delete everything in {s3_data_dir}/{schema}/
and leave resources that are being used by other PRs intact.
6c0fa37
to
902f61d
Compare
Requesting both @dfsnow and @wrridgeway since this PR makes some major architectural choices about how we plan to structure dbt environments, but I only need a detailed review from one of you! |
@@ -48,14 +48,24 @@ Build the models to create views in our Athena warehouse: | |||
dbt run | |||
``` | |||
|
|||
By default, all `dbt` commands will run against the `dev` environment, which | |||
namespaces the resources it creates by prefixing target database names with | |||
your Unix `$USER` name (e.g. `jecochr-default` for the `default` database when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (blocking): How does this work when the name of a branch includes an unexpected character (i.e. the /
in jeancochrane/28-data-catalog-add-production-profile-to-the-dbt-configuration
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I think we can replace special characters (and uppercase letters, which are not allowed in dbt-athena
) using Jinja template filters. I'll take a stab at that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dfsnow This spiraled a little bit, but I was able to make it work in 181e516! It required a new custom macro, but that led to me realize that we were starting to build a lot of logic into a collection of untested macros, so I took a minute to design a quick unit testing framework for our macros. Curious what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Nice work and definitely needed since this is getting more complex. I'm a dumb dumb and commented on a commit instead of the PR. Linking here: 181e516#r123228596
Overall, I think this looks great. My one note is that it might be worthwhile to prefix the prefix with something that identifies a dev table so that we can easily grep them using the CLI (I suggest So |
…scores See comment thread here for reasoning: 181e516#r123228596
This PR adjusts our dbt profiles and our custom
generate_schema_name
macro to support production and CI targets for the data catalog.The way this configuration works is:
--target
flag that is accepted by all dbt commands. If no flag is provided, the default environment isdev
.dev
environment adds a prefix to the database name matching to the caller's Unix username, e.g. the output database will be calleddev-jecochr-default
for targets built into the database nameddefault
whendbt run
is run on my machine.ci
environment adds a prefix matching to the name of the branch, e.g.ci-feature-branch-1-default
, with special characters removed and all letters forcibly lowercased to matchdbt-athena
's requirements.$GITHUB_BASE_REF
env variable (docs). We will also detect when the pull request has been closed or merged (docs) and use that as a trigger to cleanup any resources that were created for the PR.prod
environment does not add any prefix, e.g.default
.A quick test confirming this behavior on my machine:
This PR also adds a new framework for unit testing macros. Macro tests can now be run via the following command:
$ dbt run-operation test_all
Sample output:
Closes #28.