Implement `time_unit` option for `decode_cf_timedelta` #3

spencerkclark · 2025-01-13T00:16:14Z

My main remaining concern with pydata#9618 is that timedelta decoding / encoding is inconsistent with what is done for datetimes, i.e.:

It is possible to decode into non-nanosecond resolution timedelta values without specifying the time_unit, which would be a breaking change, e.g.:

>>> times = [1, 2, 3]
>>> xr.coding.times.decode_cf_timedelta(times, "seconds")
array([1, 2, 3], dtype='timedelta64[s]')

Encoding large non-nanosecond resolution timedelta values can silently lead to overflow, e.g.:

>>> time = np.timedelta64(300 * 365, "D").astype("timedelta64[s]")
>>> xr.coding.times.encode_cf_timedelta(time)
(array(-9223372036854775808), 'nanoseconds')

I have not yet gone the step of exposing the option to change the time_unit through CFTimedeltaCoder through the decode_timedelta argument, but this PR implements what I think is needed to bring the internals of non-nanosecond timedelta decoding / encoding up to speed with datetime decoding / encoding (I can do that later if you want).

Similar to datetime decoding, the default behavior is to always return nanosecond resolution values, or raise an error if this would lead to overflow. As in datetime decoding, if a time_unit is specified that is coarser than the encoded timedeltas, the time_unit is updated to prevent precision loss.

I added a few tests, which exercise this aspect of the code, including some that check overflow-safety and others that check we can accurately roundtrip large timedeltas. With these changes the above examples work as expected:

>>> times = [1, 2, 3]
>>> xr.coding.times.decode_cf_timedelta(times, "seconds")
array([1000000000, 2000000000, 3000000000], dtype='timedelta64[ns]')

>>> time = np.timedelta64(300 * 365, "D").astype("timedelta64[s]")
>>> xr.coding.times.encode_cf_timedelta(time)
(array(109500), 'days')

kmuehlbauer · 2025-01-13T07:20:24Z

That's great @spencerkclark! 👍 💯 This should definitely be part of pydata#9618 for sake of consistency and backwards compatibility. So I'm inclined to merge this.

spencerkclark · 2025-01-14T03:01:34Z

Sounds good! We can address adding time_unit to the CFTimedeltaCoder once pydata#9618 is merged.

Co-authored-by: Stephan Hoyer <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Spencer Clark <[email protected]> Co-authored-by: Spencer Clark <[email protected]> Co-authored-by: Stephan Hoyer <[email protected]> Co-authored-by: Spencer Clark <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix scalar handling for timedelta based indexer * remove stale error message and "ignore:Converting non-default" in testsuite * add per review suggestions * add/remove todo * rename timeunit -> format * return "ns" resolution per default for timedeltas, if not specified * Be specific on types/dtpyes * add comment * add suggestions from code review * fix docs * fix test which isn't run for numpy2 atm * add notes on to_datetime section, update examples showing usage of 'as_unit' * use np.timedelta64 for to_timedelta example, update as_unit example, update note * remove note * Apply suggestions from code review Co-authored-by: Deepak Cherian <[email protected]> * refactor timedelta decoding to _numbers_to_timedelta and res-use it within decode_cf_timedelta * fix conventions test, add todo * run times through pd.Timestamp to catch possible overflows * fix tests for cftime_to_nptime * fix cftime_to_nptime in cftimeindex * introduce pd.Timestamp instance check * warn if out-of-bound datetimes are encoded with standard calendar, fall back to cftime encoding, add fix for cftime issue where python datetimes are not encoded correctly with date2num. * fix time-coding.rst, add reference to time-series.rst. * try to fix typing, ignore one * try to fix docs * revert doc-changes * Add a non-ns test for polyval, polyfit * more doc cosmetics * add whats-new.rst entry * add/fix coder docstring * add xr.date_range example as suggested per review * Apply suggestions from code review Co-authored-by: Spencer Clark <[email protected]> * Implement `time_unit` option for `decode_cf_timedelta` (#3) * Fix timedelta encoding overflow issue; always decode to ns resolution * Implement time_unit for decode_cf_timedelta * Reduce diff * fix typing * use nanmin/nanmax, catch numpy RuntimeWarnings * Apply suggestions from code review Co-authored-by: Kai Mühlbauer <[email protected]> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Stephan Hoyer <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Spencer Clark <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>

spencerkclark added 3 commits January 12, 2025 17:13

Fix timedelta encoding overflow issue; always decode to ns resolution

122b19b

Implement time_unit for decode_cf_timedelta

99932e9

Reduce diff

97bb93e

spencerkclark mentioned this pull request Jan 13, 2025

Relax nanosecond datetime restriction in CF time decoding pydata/xarray#9618

Merged

9 tasks

kmuehlbauer merged commit ffc1828 into kmuehlbauer:any-time-resolution-2 Jan 13, 2025

spencerkclark deleted the 2025-01-12-minimal-changes branch January 14, 2025 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `time_unit` option for `decode_cf_timedelta` #3

Implement `time_unit` option for `decode_cf_timedelta` #3

spencerkclark commented Jan 13, 2025 •

edited

Loading

kmuehlbauer commented Jan 13, 2025

spencerkclark commented Jan 14, 2025

Implement time_unit option for decode_cf_timedelta #3

Implement time_unit option for decode_cf_timedelta #3

Conversation

spencerkclark commented Jan 13, 2025 • edited Loading

kmuehlbauer commented Jan 13, 2025

spencerkclark commented Jan 14, 2025

Implement `time_unit` option for `decode_cf_timedelta` #3

Implement `time_unit` option for `decode_cf_timedelta` #3

spencerkclark commented Jan 13, 2025 •

edited

Loading