Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented OrcaVault psa schema #17

Merged

Conversation

victorskl
Copy link
Member

  • Persistent Staging Area (psa) is a layer in data warehouse where data is
    kept archived and track change history. The data table in this area also
    act as an intermediate storage location to provide downstream transformation.
  • Made use of spreadsheet_library_tracking_metadata table as another data
    source to feed into vault layer for change history and data consolidation use.
  • Typically, psa data tables is processed by dbt with append only incremental
    materialization and the data is sourced from tsa schema table counterpart.
  • Added next.sh script and tsa.truncate_tables() db function to simulate
    incremental data loading between tsa to psa transformation in local dev setup.

* Persistent Staging Area (psa) is a layer in data warehouse where data is
  kept archived and track change history. The data table in this area also
  act as an intermediate storage location to provide downstream transformation.
* Made use of spreadsheet_library_tracking_metadata table as another data
  source to feed into vault layer for change history and data consolidation use.
* Typically, psa data tables is processed by dbt with append only incremental
  materialization and the data is sourced from tsa schema table counterpart.
* Added `next.sh` script and `tsa.truncate_tables()` db function to simulate
  incremental data loading between tsa to psa transformation in local dev setup.
@victorskl
Copy link
Member Author

Related #15

@victorskl victorskl self-assigned this Jan 11, 2025
@victorskl victorskl added the enhancement New feature or request label Jan 11, 2025
@victorskl victorskl added this pull request to the merge queue Jan 11, 2025
Merged via the queue into main with commit 0c4dcff Jan 11, 2025
4 checks passed
@victorskl victorskl deleted the implement-psa-schema-spreadsheet-library-tracking-metadata branch January 11, 2025 13:22
victorskl added a commit that referenced this pull request Jan 12, 2025
* Story: Let Glue the Google LIMS! (continue)
  As discussed in #20, we now source `tsa.spreadsheet_google_lims` staging data
  table with dbt and feed into the downstream warehouse psa schema.
* Technical steps are now mainly inherited by the framework implemented in PR #17.
* With psa, Google LIMS is incrementally loaded with differential data records per
  daily scheduled run with dbt ELT job.
* Chiefly note; since Google LIMS preserved "timestamp" date column, we made use of it
  as (replay) historical time for the row record. Warehouse load datetime is derived
  from this timestamp column as initial cutover data extraction date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant