Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust dbt profiles and schema macro to support prod/CI targets #44

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions dbt/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,6 @@ models:
athena:
+materialized: view
default:
+schema: dbt-test-default
dfsnow marked this conversation as resolved.
Show resolved Hide resolved
+schema: default
location:
+schema: dbt-test-location
+schema: location
32 changes: 29 additions & 3 deletions dbt/macros/generate_schema_name.sql
Original file line number Diff line number Diff line change
@@ -1,15 +1,41 @@
-- Override the default schema naming to remove the dbt-added prefix.
-- Override the default schema naming to remove the autogenerated prefix
-- and replace it with our own namespacing on dev and CI.
-- See: https://docs.getdbt.com/docs/build/custom-schemas
{% macro generate_schema_name(custom_schema_name, node) -%}

{#
According to the dbt docs linked above, this is required to be set by
the built-in macro that we are overriding, but we don't actually use it
#}
{%- set default_schema = target.schema -%}

{%- if target.name == "dev" -%}
{%- set schema_prefix = env_var("USER") -%}
{%- elif target.name == "ci" -%}
{%- set schema_prefix = env_var("GITHUB_BASE_REF") -%}
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
{%- else -%}
{%- set schema_prefix = "" -%}
{%- endif -%}

{%- if custom_schema_name is none -%}

{{ default_schema }}
{#
The default schema name is not allowed, since we use subdirectory
organization to map tables/views to their Athena database
#}
{{ exceptions.raise_compiler_error(
"Missing schema definition for " ~ node.name ~ ". " ~
"Its containing subdirectory is probably missing a `+schema` " ~
"attribute under the `models` config in dbt_project.yml."
) }}

{%- else -%}

{{ custom_schema_name | trim }}
{%- set full_schema_name -%}
{{ schema_prefix ~ "-" ~ custom_schema_name | trim }}
{%- endset -%}

{{ full_schema_name }}

{%- endif -%}

Expand Down
19 changes: 19 additions & 0 deletions dbt/profiles.yml
dfsnow marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,22 @@ athena:
# "database" here corresponds to a Glue data catalog
database: awsdatacatalog
threads: 5
ci:
type: athena
s3_staging_dir: s3://ccao-dbt-athena-ci-us-east-1/results/
s3_data_dir: s3://ccao-dbt-athena-ci-us-east-1/data/
region_name: us-east-1
schema: dbt-test
database: awsdatacatalog
# Prefix all generated data by schema, so that we can delete it when the
# PR is merged
s3_data_naming: schema_table
Comment on lines +21 to +23
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not totally sure if this works yet, since we're not yet using dbt to manage our CTAs, but the idea here is that eventually when dbt builds CTAs and stores them in S3 the schema_table config will tell it to store those files in the configured s3_data_dir bucket with the path {s3_data_dir}/{schema}/{table}/ (docs). That way when we're ready to have CD clean up the resources generated for the PR, we can use the schema name defined by the generate_schema_name to delete everything in {s3_data_dir}/{schema}/ and leave resources that are being used by other PRs intact.

threads: 5
prod:
type: athena
s3_staging_dir: s3://ccao-athena-results-us-east-1/
s3_data_dir: s3://ccao-athena-data-us-east-1/
region_name: us-east-1
schema: default
database: awsdatacatalog
threads: 5