Implemented OrcaVault psa schema #17

victorskl · 2025-01-11T13:17:13Z

Persistent Staging Area (psa) is a layer in data warehouse where data is
kept archived and track change history. The data table in this area also
act as an intermediate storage location to provide downstream transformation.
Made use of spreadsheet_library_tracking_metadata table as another data
source to feed into vault layer for change history and data consolidation use.
Typically, psa data tables is processed by dbt with append only incremental
materialization and the data is sourced from tsa schema table counterpart.
Added next.sh script and tsa.truncate_tables() db function to simulate
incremental data loading between tsa to psa transformation in local dev setup.

* Persistent Staging Area (psa) is a layer in data warehouse where data is kept archived and track change history. The data table in this area also act as an intermediate storage location to provide downstream transformation. * Made use of spreadsheet_library_tracking_metadata table as another data source to feed into vault layer for change history and data consolidation use. * Typically, psa data tables is processed by dbt with append only incremental materialization and the data is sourced from tsa schema table counterpart. * Added `next.sh` script and `tsa.truncate_tables()` db function to simulate incremental data loading between tsa to psa transformation in local dev setup.

victorskl · 2025-01-11T13:18:34Z

Related #15

* Story: Let Glue the Google LIMS! (continue) As discussed in #20, we now source `tsa.spreadsheet_google_lims` staging data table with dbt and feed into the downstream warehouse psa schema. * Technical steps are now mainly inherited by the framework implemented in PR #17. * With psa, Google LIMS is incrementally loaded with differential data records per daily scheduled run with dbt ELT job. * Chiefly note; since Google LIMS preserved "timestamp" date column, we made use of it as (replay) historical time for the row record. Warehouse load datetime is derived from this timestamp column as initial cutover data extraction date.

victorskl self-assigned this Jan 11, 2025

victorskl added the enhancement New feature or request label Jan 11, 2025

victorskl added this pull request to the merge queue Jan 11, 2025

Merged via the queue into main with commit 0c4dcff Jan 11, 2025
4 checks passed

victorskl deleted the implement-psa-schema-spreadsheet-library-tracking-metadata branch January 11, 2025 13:22

victorskl mentioned this pull request Jan 12, 2025

Implemented OrcaVault psa schema spreadsheet Google LIMS #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented OrcaVault psa schema #17

Implemented OrcaVault psa schema #17

victorskl commented Jan 11, 2025

victorskl commented Jan 11, 2025

Implemented OrcaVault psa schema #17

Implemented OrcaVault psa schema #17

Conversation

victorskl commented Jan 11, 2025

victorskl commented Jan 11, 2025