Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement time_unit option for decode_cf_timedelta #3

Conversation

spencerkclark
Copy link

@spencerkclark spencerkclark commented Jan 13, 2025

My main remaining concern with pydata#9618 is that timedelta decoding / encoding is inconsistent with what is done for datetimes, i.e.:

  • It is possible to decode into non-nanosecond resolution timedelta values without specifying the time_unit, which would be a breaking change, e.g.:
>>> times = [1, 2, 3]
>>> xr.coding.times.decode_cf_timedelta(times, "seconds")
array([1, 2, 3], dtype='timedelta64[s]')
  • Encoding large non-nanosecond resolution timedelta values can silently lead to overflow, e.g.:
>>> time = np.timedelta64(300 * 365, "D").astype("timedelta64[s]")
>>> xr.coding.times.encode_cf_timedelta(time)
(array(-9223372036854775808), 'nanoseconds')

I have not yet gone the step of exposing the option to change the time_unit through CFTimedeltaCoder through the decode_timedelta argument, but this PR implements what I think is needed to bring the internals of non-nanosecond timedelta decoding / encoding up to speed with datetime decoding / encoding (I can do that later if you want).

Similar to datetime decoding, the default behavior is to always return nanosecond resolution values, or raise an error if this would lead to overflow. As in datetime decoding, if a time_unit is specified that is coarser than the encoded timedeltas, the time_unit is updated to prevent precision loss.

I added a few tests, which exercise this aspect of the code, including some that check overflow-safety and others that check we can accurately roundtrip large timedeltas. With these changes the above examples work as expected:

>>> times = [1, 2, 3]
>>> xr.coding.times.decode_cf_timedelta(times, "seconds")
array([1000000000, 2000000000, 3000000000], dtype='timedelta64[ns]')
>>> time = np.timedelta64(300 * 365, "D").astype("timedelta64[s]")
>>> xr.coding.times.encode_cf_timedelta(time)
(array(109500), 'days')

@kmuehlbauer
Copy link
Owner

That's great @spencerkclark! 👍 💯 This should definitely be part of pydata#9618 for sake of consistency and backwards compatibility. So I'm inclined to merge this.

@kmuehlbauer kmuehlbauer merged commit ffc1828 into kmuehlbauer:any-time-resolution-2 Jan 13, 2025
@spencerkclark spencerkclark deleted the 2025-01-12-minimal-changes branch January 14, 2025 03:00
@spencerkclark
Copy link
Author

Sounds good! We can address adding time_unit to the CFTimedeltaCoder once pydata#9618 is merged.

kmuehlbauer added a commit that referenced this pull request Jan 16, 2025
Co-authored-by: Stephan Hoyer <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>
Co-authored-by: Stephan Hoyer <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix scalar handling for timedelta based indexer

* remove stale error message and "ignore:Converting non-default" in testsuite

* add per review suggestions

* add/remove todo

* rename timeunit -> format

* return "ns" resolution per default for timedeltas, if not specified

* Be specific on types/dtpyes

* add comment

* add suggestions from code review

* fix docs

* fix test which isn't run for numpy2 atm

* add notes on to_datetime section, update examples showing usage of 'as_unit'

* use np.timedelta64 for to_timedelta example, update as_unit example, update note

* remove note

* Apply suggestions from code review

Co-authored-by: Deepak Cherian <[email protected]>

* refactor timedelta decoding to _numbers_to_timedelta and res-use it within decode_cf_timedelta

* fix conventions test, add todo

* run times through pd.Timestamp to catch possible overflows

* fix tests for cftime_to_nptime

* fix cftime_to_nptime in cftimeindex

* introduce pd.Timestamp instance check

* warn if out-of-bound datetimes are encoded with standard calendar, fall back to cftime encoding, add fix for cftime issue where python datetimes are not encoded correctly with date2num.

* fix time-coding.rst, add reference to time-series.rst.

* try to fix typing, ignore one

* try to fix docs

* revert doc-changes

* Add a non-ns test for polyval, polyfit

* more doc cosmetics

* add whats-new.rst entry

* add/fix coder docstring

* add xr.date_range example as suggested per review

* Apply suggestions from code review

Co-authored-by: Spencer Clark <[email protected]>

* Implement `time_unit` option for `decode_cf_timedelta` (#3)

* Fix timedelta encoding overflow issue; always decode to ns resolution

* Implement time_unit for decode_cf_timedelta

* Reduce diff

* fix typing

* use nanmin/nanmax, catch numpy RuntimeWarnings

* Apply suggestions from code review

Co-authored-by: Kai Mühlbauer <[email protected]>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Stephan Hoyer <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants