-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support for Delta table history #163
base: main
Are you sure you want to change the base?
Conversation
- Added describe_history() method to DeltaTableStep, enabling fetching of Delta table history as a Spark DataFrame. - Added is_date_stale() method to assess if data in a specified table is stale based on defined time intervals or a specific refresh day. - Added DTInterval class for efficient management of date and time intervals.
…e describe_history() and added some log messages for debugging
Why not For the Upd.: Yes, just tested with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Just a few small things
if err_msg.startswith("[table_or_view_not_found]") or err_msg.startswith("table or view not found"): | ||
if self.create_if_not_exists: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to put these logs back in in another spot? I kind of like it the "create table" process gives info about what it is doing.
""" | ||
|
||
if not any((months, weeks, days, hours, minutes, seconds)) and dt_interval is None: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Raises
to the docstring also - for completions sake.
Description
describe_history()
method toDeltaTableStep
, enabling fetching of Delta table history as a Spark DataFrame.is_date_stale()
method to assess if data in a specified table is stale based on defined time intervals or a specific intended refresh day.DTInterval
class for management of date and time intervals.Motivation and Context
It allows one to get a Delta table's history (based on Delta Log) as a Spark DataFrame. It also provides means for checking the staleness of data within Delta tables based on defined time intervals and specific weekdays designated for refreshing.
How Has This Been Tested?
All methods and classes have been unit tested with pytest.
Types of changes
Checklist: