Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NTD time series data: create warehouse tables #3665

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

charlie-costanzo
Copy link
Member

@charlie-costanzo charlie-costanzo commented Jan 27, 2025

Description

Following up on #3655, this PR creates new warehouse tables (staging and mart) and associated yml files for the new NTD time series endpoints that we are ingesting.

Type of change

  • New feature

How has this been tested?

Screenshot 2025-01-28 at 10 29 44 AM

Post-merge follow-ups

  • No action required

Copy link

github-actions bot commented Jan 27, 2025

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For new models, do they all have a surrogate primary key that is tested to be not-null and unique?

New models 🌱

calitp_warehouse.mart.ntd_funding_and_expenses.fct_capital_expenditures_time_series_facilities

calitp_warehouse.mart.ntd_funding_and_expenses.fct_capital_expenditures_time_series_other

calitp_warehouse.mart.ntd_funding_and_expenses.fct_capital_expenditures_time_series_rolling_stock

calitp_warehouse.mart.ntd_funding_and_expenses.fct_capital_expenditures_time_series_total

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_capital_federal

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_capital_local

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_capital_other

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_capital_state

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_capital_total

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_decommissioned_operatingfares

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_decommissioned_operatingother

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_operating_federal

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_operating_local

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_operating_other

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_operating_state

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_operating_total

calitp_warehouse.mart.ntd_funding_and_expenses.fct_operating_and_capital_funding_time_series_summary_total

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_drm

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_fares

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_opexp_ga

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_opexp_nvm

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_opexp_total

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_opexp_vm

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_opexp_vo

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_pmt

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_upt

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_voms

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_vrh

calitp_warehouse.mart.ntd_funding_and_expenses.fct_service_data_and_operating_expenses_time_series_by_mode_vrm

calitp_warehouse.mart.gtfs.fct_vehicle_locations_grouped

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__capital_expenditures_time_series__facilities

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__capital_expenditures_time_series__other

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__capital_expenditures_time_series__rolling_stock

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__capital_expenditures_time_series__total

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__capital_federal

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__capital_local

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__capital_other

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__capital_state

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__capital_total

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__decommissioned_operatingfares

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__decommissioned_operatingother

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__operating_federal

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__operating_local

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__operating_other

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__operating_state

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__operating_total

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__operating_and_capital_funding_time_series__summary_total

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__drm

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__fares

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__opexp_ga

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__opexp_nvm

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__opexp_total

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__opexp_vm

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__opexp_vo

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__pmt

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__upt

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__voms

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__vrh

calitp_warehouse.staging.ntd_funding_and_expenses.stg_ntd__service_data_and_operating_expenses_time_series_by_mode__vrm

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@charlie-costanzo charlie-costanzo self-assigned this Jan 28, 2025
@charlie-costanzo charlie-costanzo added the data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner. label Jan 28, 2025
@charlie-costanzo charlie-costanzo marked this pull request as ready for review January 28, 2025 15:41
@charlie-costanzo charlie-costanzo force-pushed the ntd-time-series-warehouse-tables branch from 2e5ff06 to 10c94a6 Compare January 28, 2025 17:13
Comment on lines +12 to +44
_1992,
_1993,
_1994,
_1995,
_1996,
_1997,
_1998,
_1999,
_2000,
_2001,
_2002,
_2003,
_2004,
_2005,
_2006,
_2007,
_2008,
_2009,
_2010,
_2011,
_2012,
_2013,
_2014,
_2015,
_2016,
_2017,
_2018,
_2019,
_2020,
_2021,
_2022,
_2023,
_2023_mode_status,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When each year is a column in a database table it is really tedious to make a query that selects every one of these columns to make a metabase query. Is there a way to transform these tables so that we have a single column of year instead of each year being a column? This would create more rows and thus make it easier to make database queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When each year is a column in a database table it is really tedious to make a query that selects every one of these columns to make a metabase query. Is there a way to transform these tables so that we have a single column of year instead of each year being a column? This would create more rows and thus make it easier to make database queries.

Hey @evansiroky – without a doubt, I was definitely planning on adding that to the next round of work on these tables. I was just planning to get these into the warehouse quickly as MVPs in their current form and then do modeling iterations based on what's most pressing/useful immediately, but I can make these modeling changes before merging this PR if preferred

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charlie, there is an example you can follow to pivot the table and build the way Evan is asking: int_ntd__monthly_ridership_with_adjustments_vrh.sql

models:
- name: dim_capital_expenditures_time_series_read_me_data_dictionary
- name: dim_operating_and_capital_funding_time_series_read_me_data_dictionary
- name: dim_service_data_and_operating_expenses_time_series_by_mode_read_me_data_dictionary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With cost reduction in my mind, I think we don't need these tables with "read_me" on the name, if you query the external table there are a lot of null columns. We would be spending time generating those tables, but they are not useful.

dim_capital_expenditures_time_series_read_me_data_dictionary
dim_operating_and_capital_funding_time_series_read_me_data_dictionary
dim_service_data_and_operating_expenses_time_series_by_mode_read_me_data_dictionary
stg_ntd__capital_expenditures_time_series__read_me_data_dictionary
stg_ntd__operating_and_capital_funding_time_series__read_me_data_dictionary
stg_ntd__service_data_and_operating_expenses_time_series_by_mode__read_me_data_dictionary

Comment on lines +12 to +44
_1992,
_1993,
_1994,
_1995,
_1996,
_1997,
_1998,
_1999,
_2000,
_2001,
_2002,
_2003,
_2004,
_2005,
_2006,
_2007,
_2008,
_2009,
_2010,
_2011,
_2012,
_2013,
_2014,
_2015,
_2016,
_2017,
_2018,
_2019,
_2020,
_2021,
_2022,
_2023,
_2023_mode_status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charlie, there is an example you can follow to pivot the table and build the way Evan is asking: int_ntd__monthly_ridership_with_adjustments_vrh.sql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants