Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mart_gtfs.fct_vehicle_locations_grouped #3660

Merged
merged 17 commits into from
Jan 28, 2025

Conversation

tiffanychu90
Copy link
Member

@tiffanychu90 tiffanychu90 commented Jan 23, 2025

Description

Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (fct_vehicle_locations_grouped) $ poetry run dbt run -s fct_vehicle_locations_grouped+
19:18:18  Running with dbt=1.5.1
19:18:30  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
19:18:31  Found 477 models, 1013 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 178 sources, 4 exposures, 0 metrics, 0 groups
19:18:31  
19:18:49  Concurrency: 8 threads (target='dev')
19:18:49  
19:18:49  1 of 1 START sql incremental model tiffany_mart_gtfs.fct_vehicle_locations_grouped  [RUN]
19:19:18  1 of 1 OK created sql incremental model tiffany_mart_gtfs.fct_vehicle_locations_grouped  [SCRIPT (24.6 GiB processed) in 28.27s]
19:19:18  
19:19:18  Finished running 1 incremental model in 0 hours 0 minutes and 46.22 seconds (46.22s).
19:19:18  
19:19:18  Completed successfully
19:19:18  
19:19:18  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (fct_vehicle_locations_grouped) $ poetry run dbt run -s +fct_vehicle_locations_grouped+
19:21:47  Running with dbt=1.5.1
19:21:50  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
19:21:51  Found 477 models, 1013 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 178 sources, 4 exposures, 0 metrics, 0 groups
19:21:51  
19:21:55  Concurrency: 8 threads (target='dev')
19:21:55  
19:21:55  1 of 19 START sql view model tiffany_staging.stg_gtfs_rt__vehicle_positions .... [RUN]
19:21:55  2 of 19 START sql view model tiffany_staging.stg_gtfs_schedule__agency ......... [RUN]
19:21:55  3 of 19 START sql view model tiffany_staging.stg_gtfs_schedule__download_outcomes  [RUN]
19:21:55  4 of 19 START sql view model tiffany_staging.stg_gtfs_schedule__file_parse_outcomes  [RUN]
19:21:55  5 of 19 START sql view model tiffany_staging.stg_gtfs_schedule__unzip_outcomes . [RUN]
19:21:55  6 of 19 START sql view model tiffany_staging.stg_transit_database__gtfs_datasets  [RUN]
19:21:56  6 of 19 OK created sql view model tiffany_staging.stg_transit_database__gtfs_datasets  [CREATE VIEW (0 processed) in 1.13s]
19:21:56  7 of 19 START sql table model tiffany_staging.int_transit_database__gtfs_datasets_dim  [RUN]
19:21:56  1 of 19 OK created sql view model tiffany_staging.stg_gtfs_rt__vehicle_positions  [CREATE VIEW (0 processed) in 1.16s]
19:21:56  3 of 19 OK created sql view model tiffany_staging.stg_gtfs_schedule__download_outcomes  [CREATE VIEW (0 processed) in 1.17s]
19:21:56  2 of 19 OK created sql view model tiffany_staging.stg_gtfs_schedule__agency .... [CREATE VIEW (0 processed) in 1.21s]
19:21:56  5 of 19 OK created sql view model tiffany_staging.stg_gtfs_schedule__unzip_outcomes  [CREATE VIEW (0 processed) in 1.33s]
19:21:56  4 of 19 OK created sql view model tiffany_staging.stg_gtfs_schedule__file_parse_outcomes  [CREATE VIEW (0 processed) in 1.34s]
19:21:56  8 of 19 START sql view model tiffany_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [RUN]
19:21:58  8 of 19 OK created sql view model tiffany_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [CREATE VIEW (0 processed) in 1.40s]
19:21:58  9 of 19 START sql view model tiffany_staging.int_gtfs_schedule__joined_feed_outcomes  [RUN]
19:21:59  9 of 19 OK created sql view model tiffany_staging.int_gtfs_schedule__joined_feed_outcomes  [CREATE VIEW (0 processed) in 1.38s]
19:21:59  10 of 19 START sql table model tiffany_mart_gtfs.dim_schedule_feeds ............ [RUN]
19:22:01  7 of 19 OK created sql table model tiffany_staging.int_transit_database__gtfs_datasets_dim  [CREATE TABLE (5.0k rows, 5.7 GiB processed) in 4.66s]
19:22:01  11 of 19 START sql table model tiffany_mart_transit_database.bridge_schedule_dataset_for_validation  [RUN]
19:22:01  12 of 19 START sql table model tiffany_mart_transit_database.dim_gtfs_datasets . [RUN]
19:22:03  11 of 19 OK created sql table model tiffany_mart_transit_database.bridge_schedule_dataset_for_validation  [CREATE TABLE (2.9k rows, 517.7 KiB processed) in 2.29s]
19:22:03  12 of 19 OK created sql table model tiffany_mart_transit_database.dim_gtfs_datasets  [CREATE TABLE (5.0k rows, 1.6 MiB processed) in 2.55s]
19:22:03  13 of 19 START sql table model tiffany_staging.int_transit_database__urls_to_gtfs_datasets  [RUN]
19:22:06  13 of 19 OK created sql table model tiffany_staging.int_transit_database__urls_to_gtfs_datasets  [CREATE TABLE (4.9k rows, 879.3 KiB processed) in 2.18s]
19:23:13  10 of 19 OK created sql table model tiffany_mart_gtfs.dim_schedule_feeds ....... [CREATE TABLE (15.9k rows, 12.3 GiB processed) in 73.82s]
19:23:13  14 of 19 START sql table model tiffany_mart_gtfs.fct_daily_schedule_feeds ...... [RUN]
19:23:17  14 of 19 OK created sql table model tiffany_mart_gtfs.fct_daily_schedule_feeds . [CREATE TABLE (303.1k rows, 3.1 MiB processed) in 4.27s]
19:23:17  15 of 19 START sql view model tiffany_mart_gtfs.fct_vehicle_positions_messages . [RUN]
19:23:18  15 of 19 OK created sql view model tiffany_mart_gtfs.fct_vehicle_positions_messages  [CREATE VIEW (0 processed) in 1.05s]
19:23:18  16 of 19 START sql incremental model tiffany_staging.int_gtfs_rt__vehicle_positions_trip_day_map_grouping  [RUN]
19:26:10  16 of 19 OK created sql incremental model tiffany_staging.int_gtfs_rt__vehicle_positions_trip_day_map_grouping  [SCRIPT (15.9 GiB processed) in 171.41s]
19:26:10  17 of 19 START sql table model tiffany_mart_gtfs.fct_vehicle_positions_trip_summaries  [RUN]
19:26:30  17 of 19 OK created sql table model tiffany_mart_gtfs.fct_vehicle_positions_trip_summaries  [CREATE TABLE (2.4m rows, 24.5 GiB processed) in 20.00s]
19:26:30  18 of 19 START sql incremental model tiffany_mart_gtfs.fct_vehicle_locations ... [RUN]
19:28:57  18 of 19 OK created sql incremental model tiffany_mart_gtfs.fct_vehicle_locations  [SCRIPT (10.7 GiB processed) in 147.27s]
19:28:57  19 of 19 START sql incremental model tiffany_mart_gtfs.fct_vehicle_locations_grouped  [RUN]
19:29:15  19 of 19 OK created sql incremental model tiffany_mart_gtfs.fct_vehicle_locations_grouped  [SCRIPT (25.9 GiB processed) in 17.37s]
19:29:15  
19:29:15  Finished running 9 view models, 7 table models, 3 incremental models in 0 hours 7 minutes and 23.76 seconds (443.76s).
19:29:15  
19:29:15  Completed successfully
19:29:15  
19:29:15  Done. PASS=19 WARN=0 ERROR=0 SKIP=0 TOTAL=19

Tests: No tests? Is this right?

poetry run dbt test -s fct_vehicle_locations_grouped
19:47:21  Running with dbt=1.5.1
19:47:24  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
19:47:25  Found 477 models, 1013 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 178 sources, 4 exposures, 0 metrics, 0 groups
19:47:25  
19:47:25  Nothing to do. Try checking your model configs and model specification args

Docs:

jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (fct_vehicle_locations_grouped) $ poetry run dbt docs generate
19:38:45  Running with dbt=1.5.1
19:38:48  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
19:38:49  Found 477 models, 1013 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 178 sources, 4 exposures, 0 metrics, 0 groups
19:38:49  
19:39:08  Concurrency: 8 threads (target='dev')
19:39:08  
19:39:52  Building catalog
19:40:19  Catalog written to /home/jovyan/data-infra/warehouse/target/catalog.json

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)

@tiffanychu90 tiffanychu90 marked this pull request as draft January 23, 2025 19:01
Copy link

github-actions bot commented Jan 23, 2025

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For new models, do they all have a surrogate primary key that is tested to be not-null and unique?

New models 🌱

calitp_warehouse.mart.gtfs.fct_vehicle_locations_grouped

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@tiffanychu90 tiffanychu90 marked this pull request as ready for review January 23, 2025 19:46
@tiffanychu90
Copy link
Member Author

@vevetron: I see this note: For new models, do they all have a surrogate primary key that is tested to be not-null and unique? Right now I'm using the key column that from fct_vehicle_locations...I want to be able to check that this derived table can be linked back to fct_vehicle_locations. Do I want to make a new surrogate primary key?

@vevetron
Copy link
Contributor

@vevetron: I see this note: For new models, do they all have a surrogate primary key that is tested to be not-null and unique? Right now I'm using the key column that from fct_vehicle_locations...I want to be able to check that this derived table can be linked back to fct_vehicle_locations. Do I want to make a new surrogate primary key?

I'm not sure, let me look into this. @charlie-costanzo might have an immediate answer.

@vevetron
Copy link
Contributor

@tiffanychu90 I can't think of a reason we need to make new surrogate keys. If we are using this data, we either use vehicle_locations or vehicle_locations_grouped.

I think the current approach, using the subset of keys is a good approach. The keys in fct_vehicle_positions are all unique, so this new table's keys should all be unique as well.

@tiffanychu90 tiffanychu90 requested review from vevetron and removed request for evansiroky January 24, 2025 21:13
@tiffanychu90
Copy link
Member Author

@tiffanychu90 I can't think of a reason we need to make new surrogate keys. If we are using this data, we either use vehicle_locations or vehicle_locations_grouped.

I think the current approach, using the subset of keys is a good approach. The keys in fct_vehicle_positions are all unique, so this new table's keys should all be unique as well.

Cool, then I think this PR is ready to merge. I ran the tests but it seemed ok, so now I'm not sure what was happening before where there was a proportion test not met. I guess we'll find out after merging!

@vevetron vevetron force-pushed the fct_vehicle_locations_grouped branch from 051a2cf to 1522494 Compare January 24, 2025 21:30
Copy link
Contributor

@vevetron vevetron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vevetron
Copy link
Contributor

Please squash before merging.

@tiffanychu90 tiffanychu90 merged commit eb42d3d into main Jan 28, 2025
4 checks passed
@tiffanychu90 tiffanychu90 deleted the fct_vehicle_locations_grouped branch January 28, 2025 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants