[Feature]: Retain bounds and compute time point for group averaging operations #565

pochedls · 2023-11-13T18:09:12Z

Is your feature request related to a problem?

Time bounds are dropped when computing group averages and the time point is set to the beginning of the averaging period.

Note that time values exist in the initial dataset:

# import xcdat
import xcdat as xc
# open dataset
dpath = '/p/user_pub/work/CMIP6/CMIP/E3SM-Project/E3SM-2-0/historical/r1i1p1f1/Amon/ts/gr/v20220830/'
ds = xc.open_mfdataset(dpath)
# show time bounds present
ds.time_bnds

<xarray.DataArray 'time_bnds' (time: 1980, bnds: 2)>
dask.array<concatenate, shape=(1980, 2), dtype=object, chunksize=(600, 2), chunktype=numpy.ndarray>
Coordinates:

time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
Dimensions without coordinates: bnds

But the bounds disappear for the group average values:

# compute annual averages
ds = ds.temporal.group_average('ts', freq='year')
# extract time_bnds
ds.time_bnds

AttributeError: 'Dataset' object has no attribute 'time_bnds'

And the time point for each group average is at the beginning of the period:

# inspect time values
 ds.time.values

array([cftime.DatetimeNoLeap(1850, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1851, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeNoLeap(1852, 1, 1, 0, 0, 0, 0, has_year_zero=True),
...

Describe the solution you'd like

Ideally we could return time_bnds with group averaging calculations. I think the returned bounds could be the lower most and upper most bound for the averaged data.
The returned time points could then be the mean of these returned time bounds, which would be more representative than a time point in the beginning of the averaged period.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

taylor13 · 2024-01-12T18:40:20Z

To be clear about what the bounds should be on the mean, consider a daily mean computed from four 6-hourly mean samples centered at 3Z, 9Z, 15Z, and 21Z. If the 6-hour means have bounds 0-6Z, 6-12Z, 12-18Z, and 18-24Z, then we want the daily mean to extend from the beginning of the interval represented by the first sample (i.e., from 0Z) to the end of the last sample (i.e., to 24Z, or 0Z of the next day). So for a daily mean for the first month of this year, the bounds would be 2024-01-01 0:00:00 and 2024-01-02 0:00:00, while the coordinate value would be 2024-01-01 12:00:00.

taylor13 · 2024-01-12T19:03:45Z

Also, in the above example, one could simply average the four 6- hour time means because they were fully contained within a single day (and fully covered all the hours of the day). If the 6-hourly time-mean samples were centered instead on 0Z, 6Z, 12Z, and 18Z, then to form a daily mean extending from 0-24Z, you would need to compute the mean as (.5x0 + x1 + x2 + x3 + .5*x4)/4. That is each sample should be weighted by the time interval overlapping the daily time-interval of interest.

tomvothecoder · 2025-01-15T18:16:16Z

@pochedls Can you update with edge cases from meeting.

pochedls · 2025-01-15T20:03:34Z

In my comment on @tomvothecoder's PR, I said I thought we could take the upper and lower bound for each group. Say you wanted a JJA average and the bounds for each month were at the start and the end of the month (e.g., June had bounds of 7/1 00:00 and 8/1 00:00:

...     |     M     |     J     |     J     |     A     |     S     |     ...     |
     ...            |                 JJA               |                 ...

Then you would simply take the June lower bound (6/1 00:00) as the lower JJA bound (and take the August upper bound: 8/1 00:00 as the upper bound for JJA).

This won't work if each month's bounds do not line up with the group averaging frequency. For example, if you had pentad data of 5-day intervals:

... |A |B |C |D |E |F |G |H |I |J |K |L |M |N |O |P |Q |R |S |T |U |V |W |X |Y |Z |  ...
     ...            |                 JJA               |                 ...

In this case, pentad F and N are partially in the JJA season. F could have bounds like ("2020-06-28 00:00:00", "2020-07-02 00:00:00"). Taking the lower bound (2020-06-28 00:00:00) would be incorrect and the bound should be set to 2020-07-01 00:00.

An algorithm to do this is not immediately obvious to me.

taylor13 · 2025-01-15T20:28:47Z

When you compute the monthly (or seasonal) mean from data representing 5-day means, you would need to:

find all pentads with upper and/or lower bounds falling within the target period.
consider the first contributing pentad and if its lower bound is less than the beginning of the target period (e.g., it spans days from the preceding month to the target month), reset it to the beginning of the target period (e.g., the beginning of the month).
consider the last contributing pentad and if its upper bound is greater than the end of the target period (e.g., it extends beyond the target period to the next month), reset it to the end of the target period (e.g., the end of the month).
define weights for each sample as the time interval it spans, i.e., from the beginning (possibly now adjusted) time-bound to the end (possibly adjusted) time-bound.
Compute the weighted time mean from the pentads with those weights (pentads falling fully within the target period would be equally weighted, but the pentads at the beginning and end of the period might be down-weighted).
For the resulting monthly (or seasonal) weighted means, define its time bounds with lower bound set to the (possibly now adjusted) beginning bound of the first sample contributing to the mean, and the upper bound set to the (possibly now adjusted) end bound of the last sample contributing to the mean.

pochedls · 2025-01-15T20:36:02Z

Although @taylor13's comment is seemingly off-topic (this issue is on generating bounds), he is probably right that the averaging functionality may need to be refactored as a related issue. I think @taylor13's points are documented in this issue.

github-project-automation bot added this to xCDAT Development Nov 13, 2023

github-project-automation bot moved this to Todo in xCDAT Development Nov 13, 2023

tomvothecoder added the type: enhancement New enhancement request label Dec 19, 2023

pochedls mentioned this issue Jan 12, 2024

[Bug]: temporal.group_average daily to monthly assigns wrong time index (and bounds) #586

Open

pochedls mentioned this issue Nov 21, 2024

Add support for custom seasons spanning calendar years #423

Merged

14 tasks

tomvothecoder added this to the FY25Q1 (10/01/24 - 12/31/24) milestone Nov 21, 2024

tomvothecoder self-assigned this Nov 21, 2024

tomvothecoder linked a pull request Nov 22, 2024 that will close this issue

Add temporal bounds and center times for group_average() API #717

Draft

9 tasks

tomvothecoder modified the milestones: FY25Q1 (10/01/24 - 12/31/24), FY25 Q2 (01/01/25 - 03/31/25) Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Retain bounds and compute time point for group averaging operations #565

[Feature]: Retain bounds and compute time point for group averaging operations #565

pochedls commented Nov 13, 2023

taylor13 commented Jan 12, 2024

taylor13 commented Jan 12, 2024

tomvothecoder commented Jan 15, 2025

pochedls commented Jan 15, 2025

taylor13 commented Jan 15, 2025

pochedls commented Jan 15, 2025 •

edited

Loading

[Feature]: Retain bounds and compute time point for group averaging operations #565

[Feature]: Retain bounds and compute time point for group averaging operations #565

Comments

pochedls commented Nov 13, 2023

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

taylor13 commented Jan 12, 2024

taylor13 commented Jan 12, 2024

tomvothecoder commented Jan 15, 2025

pochedls commented Jan 15, 2025

taylor13 commented Jan 15, 2025

pochedls commented Jan 15, 2025 • edited Loading

pochedls commented Jan 15, 2025 •

edited

Loading