Gap problem for date matching? #85

rod-glover · 2022-01-05T17:24:35Z

I have the following TODO in the date matching function in src/utils/portals-common/portals-common.js:

// TODO: Coalesce adjacent histories from a station. NB: tricky.
//  If we don't do this, and we use strict date matching, then a station with
//  several histories fully covering an interval will not be selected, even
//  though it should be. The question of what "adjacent" means is a bit tricky
//  ... would depend in part on history.freq to distinguish too-large gaps,
//  but we already know that attribute isn't always an accurate reflection of
//  actual observations.

In a PR, the following discussion ensued:

We don't care about gaps. The users of the data necessarily has to assume that there will be gaps... that's just the nature of weather data collection.

Originally posted by @jameshiebert in #83 (comment)

That's interesting. First, let's eliminate what I'm calling "strict" date matching, which is the condition
min_obs_time < uiStartDate < uiEndDate < max_obs_time, plus allowance for nils
meaning complete containment in the observation interval. This is currently not used.

Currently, in this app, legacy PDP matching is used, which is the looser condition
min_obs_time < uiEndDate && uiStartDate < max_obs_time, plus allowance for nils
meaning overlap with the observation interval

Currently, for both legacy PDP and this app, date matching is done separately for each history in a station. If this isn't right, it can be adjusted in this app.

Originally posted by @rod-glover in #83 (comment)

What I meant about gaps is this: Consider a station with multiple, say 2, histories spanning dates A to B (hx1) and C to D (hx2).

Suppose the user specifies start and end dates s, e such that A < s < B < C < e < D. The legacy matching rule will operate as follows:

hx1: A < e && s < B === true: match
hx2 C < e && s < D === true: match

That's as desired.

But when we have, say, A < B < s < e < C < D then:

hx1: A < e && s < B === false: no match
hx2 C < e && s < D === false: no match

When start or end date fall into the gap between the histories' intervals, the matching fails. Even one of them in the gap means that one of the two histories will not match.

If we decide that the two histories really form one contiguous period, then this matching rule is incorrect. It will bite us if there are large gaps (C - B) into which a start or end date could easily fall. In that case, we'd want a test that used only A to D as the interval, and matched both histories on that basis. That's not hard, but it's different than we have now.

Also, it gets more complicated if we decide that a "large" gap (C - B) means something different than a small gap.

Also, maybe there's a complication with multiple histories that overlap: A < C < B < D or the like.

Questions:

Does the gap problem matter?
Legacy PDP's effective definition of "station" is "history", and effectively ignores meta_station records. SDP treats history records linked by a common station as related.

Originally posted by @rod-glover in #83 (comment)

One further note: Because they are drawn from station_obs_stats_mv, which is updated directly from observations, min_obs_time and max_obs_time are never null (oops: except if there are no observations), and therefore the checking for nulls in the matching rules can (maybe) be simplified accordingly.

But they are not like edate, in which null means "ongoing". Ongoing or not, these values are non-null except when there are no observations at all for that history.

Originally posted by @rod-glover in #83 (comment)

The text was updated successfully, but these errors were encountered:

rod-glover mentioned this issue Jan 5, 2022

Fix discrepancies with legacy portal in station / history counts #83

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gap problem for date matching? #85

Gap problem for date matching? #85

rod-glover commented Jan 5, 2022

Gap problem for date matching? #85

Gap problem for date matching? #85

Comments

rod-glover commented Jan 5, 2022