Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline our assets containing information about features #324

Open
jeancochrane opened this issue Jan 13, 2025 · 0 comments
Open

Streamline our assets containing information about features #324

jeancochrane opened this issue Jan 13, 2025 · 0 comments

Comments

@jeancochrane
Copy link
Contributor

Once #315 lands, we'll have four different places that we store information about features:

  • The vars dict CSV file that powers the ccao::vars_dict object, which we use as a crosswalk between the different names for our features in different data sources
  • The model/schema.yml dbt config file, which records descriptions for features
  • The params.yaml file in this repo, whose model$predictor$all array records the canonical list of input features for the model
  • The README for this repo, which pulls information from all of the above sources to produce the Features Used table and the docs/data-dict.csv object

The fact that feature information is scattered across three different repos makes it confusing and brittle to maintain our features. We should consolidate some of these data sources so that they're easier to maintain. I think the lowest-hanging fruit is probably moving the variable crosswalk to the dbt DAG (related to ccao-data/ccao#30).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant