Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.5.x: Timezone conversion result (LocalResult) #1448

Open
pitdicker opened this issue Feb 19, 2024 · 52 comments
Open

0.5.x: Timezone conversion result (LocalResult) #1448

pitdicker opened this issue Feb 19, 2024 · 52 comments

Comments

@pitdicker
Copy link
Collaborator

pitdicker commented Feb 19, 2024

Part of this has been discussed before in #716 and #1049, and was my motivation to start contributing to chrono in #1017.

The calculation to go from a local time (wall clock time) to an instant in UTC can end up in three cases:

  • a single, simple result
  • an ambiguous result when the clock is turned backwards during a transition due to for example DST.
  • no result when the clock is turned forwards during a transition due to for example DST.

Our current LocalResult type matches those three cases:

pub enum LocalResult<T> {
    Single(T),
    Ambiguous(T, T), /* first, last */
    None,
}

The type of T is usually an offset or DateTime.

As we discovered this simple definition has some issues.

No way to distinguish errors from the None case

The type that implements TimeZone and returns a LocalResult may not be completely under our/your control. Prime example is our Local type. It wraps platform APIs, which can fail for various cases. Our current code returns LocalResult::None on such a failure.

Now if a method returns LocalResult::None there is no way for the user to know if it is because the datetime falls within a gap, or because there is an error.

Not enough info to handle the None case

Suppose we want to write a function that returns exactly how many seconds have passed since midnight, taking into account DST transitions.
That should be simple. Take the supplied DateTime, create a NaiveDateTime with the time set to 00:00:00, use TimeZone::from_local_datetime to create a DateTime at midnight and subtract.

What if some government decided to do a DST transition right at midnight, say from 23:30 to 00:30? TimeZone::from_local_datetime would return LocalResult::None, and our function can't return a result. However both 23:30 (from before moving the clock forwards) and 00:30 map to the same time in UTC, and that is the time you could consider midnight for this calculation.

LocalResult::None would be more useful if it returned more info than None, such as the UTC time during the gap and the value of T before and after the transition.

Related problem: wrapping other APIs that don't give enough info

Other APIs may return a single value, even when it would be ambiguous or doesn't exist.
And I suppose there could be API's, like chrono 0.4.x, that can return two results for ambiguous cases but can't return more info for the 'gap' case.

I'm not sure if there is anything special we want to do here. But it may be useful for callers to know that some platform or library may return bogus results around transitions, with no way to detect that.

When to use a LocalResult

Methods on DateTime currently don't return a LocalResult, but consider any result that falls in a timezone transition as an error. This makes it cumbersome for users to handle these cases correctly. They have to work on a NaiveDateTime and then convert it back with TimeZone::from_local_datetime. Returning a LocalResult would be a big improvement for correctness in my opinion.

(a proposal comes in the next comment)
cc @Zomtir

@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 19, 2024

Proposal

We should remove the None case, and replace it with InGap and Err.

pub enum LocalResult<T> {
    Single(T),
    Ambiguous(T, T), /* first, last */
    InGap(DateTime<Utc>, T, T), /* time of transition, before, after */
    Err(Error)
}

New is InGap with three arguments. They all describe the transition during which time is missing: two values T map to the Offset or DateTime right when the transition starts, and right when the transition ends. The DateTime<Utc> is needed because if T is an Offset, there is nothing to carry the timestamp value.

Return Result<LocalResult> instead of adding an error variant?

The Utc and FixedOffset types are used a lot, and will always return only LocalResult::Single or an unlikely Error::OutOfRange (in anything that builds upon TimeZone::from_local_datetime). Returning a LocalResult is already mildly inconvenient. Turning that into Result<LocalResult<T>> seems to much.

A LocalResult has helper methods such as the existing single(), earliest() and latest(), and hopefully more after #716. They currently return an Option but are going to return a Result. If we return a Result<LocalResult<T>> that would mean users have to go from that to LocalResult<T>, to Result<T>, to T.

I'd like to put any error inside the LocalResult, as we currently do with the None variant. It seems ergonomic to let users first decide how to deal with timezone transition cases, and then how to handle any remaining error.

Error type

The error type will be shared by all implementors of the TimeZone trait. That makes it relevant to more than just the existing code in chrono; also to implementations in other crates. Unless we want to make it generic or take a boxed error of course, but that seems more complex than we need or make it incompatible with no_std.

Also we need the error type to either be the same as the shared Error type in chrono (#1049), or something that can easily be wrapped or converted to it.

I hope the shared Error type can just become sufficient. That will need a bit more thought to say.

@Zomtir
Copy link
Contributor

Zomtir commented Feb 20, 2024

I agree with the arguments that wrapping Result<LocalResult> is a bit cumbersome. If replacing None with Err proves to be usable, I'm all for that.

Two principles look important to me:

  • Allow the user to conveniently get a good enough approximation of his wanted time
  • Do not let the user think that this function is always "correct"

Therefor I propose a slightly different naming and behaviour. Ambiguous is not meaningful enough to distinguish between a bivalent outcome and a time that does not exist because of a DST gap. Adjusted would contain a correct local time shifted by the size of the gap (shifted = naive + TimeDelta(before, after)). Going from 2:05 to 3:05 is a reasonable outcome without obscuring the fact that it was adjusted.

pub enum LocalResult<T> {
    Single(T),
    Bivalent(T, T), /* first, last */
    Adjusted(T, T, T), /* shifted, before, after */
    Err(Error)
}

I vouch for deprecating/removing the unwrap() function for LocalResult and replacing it with an approx() function that contains a good guess without letting the user believe that everything went 100% correct.

fn approx(self) -> Result<T, Error> {
    match self {
        Single(t) => Ok(t),
        Bivalent(t1,t2) => Ok(t1),
        Adjusted(t0, t1, t2) => Ok(t0),
        Err(e) => Err(e)
    }
}

@pitdicker
Copy link
Collaborator Author

The terms other libraries use to describe what happens during a transition are (https://peps.python.org/pep-0495/#terminology):

  • turning the clock backwards creates a fold in local time, during which the local time is ambiguous.
  • turning the clock forwards creates a gap in local time, during which the local time is missing, or does not exist.

So the current naming seems good to keep.

During a fold there are two clear answers: the earliest and latest result.
During a gap we can describe when the gap happens (UTC timestamp), and how that maps to offsets or datetimes before and after the gap.
The goal is not to dictate what is reasonable, we want to let the caller decide depending on their use case. This info should give callers and helper methods all they need to deal with transitions.

Two principles look important to me:

  • Allow the user to conveniently get a good enough approximation of his wanted time
  • Do not let the user think that this function is always "correct"

I am not sure which parts of dealing with timezone transitions makes you believe the result is an approximation or incorrect.

Of course what helper methods make the most sense besides the current ones is something we can discuss. I think something like earliest_or_before and latest_or_after would cover many uses. Your approx as a 'just get me some answer, I don't care about transitions'-method may also be useful. Continue this discussion on helper methods in #716, to keep this about the info LocalResult needs to carry?

@pitdicker
Copy link
Collaborator Author

I updated the description of InGap. Maybe that helps.

@djc
Copy link
Member

djc commented Feb 20, 2024

The Utc and FixedOffset types are used a lot, and will always return only LocalResult::Single or an unlikely Error::OutOfRange (in anything that builds upon TimeZone::from_local_datetime). Returning a LocalResult is already mildly inconvenient. Turning that into Result<LocalResult<T>> seems to much.

Can we implement a separate trait for these which we can use to "specialize" results for these cases?

@pitdicker
Copy link
Collaborator Author

Can we implement a separate trait for these which we can use to "specialize" results for these cases?

I remember reading that idea in one of the issues, but can't find it. Honestly I have no idea what that would look like, and if it would make chrono any easier to use.

In my opinion returning LocalResult is just good. Currently methods in the TimeZone trait return LocalResult, and I would like to see more methods on DateTime do the same. Making methods on DateTime<Local> or DateTime<SomeTimeZone> have a different return value from DateTime<FixedOffset> and DateTime<Utc> seems strange, if that is even possible.

If methods on DateTime and in the TimeZone trait return LocalResult instead of Result it is less convenient but not all that much. They can still call .unwrap() on the LocalResult, or bubble up errors with .single()?.

@Zomtir
Copy link
Contributor

Zomtir commented Feb 20, 2024

The terms other libraries use to describe what happens during a transition are (https://peps.python.org/pep-0495/#terminology):

* turning the clock backwards creates a _fold_ in local time, during which the local time is _ambiguous_.
* turning the clock forwards creates a _gap_ in local time, during which the local time is _missing_, or does not exist.

Thanks for the link, that was what I was looking for. Fold and Gap are short expressive terms that precisely describe their use case. There is no mention of ambiguous on that page as far as I can see.

Of course there is the argument to make that Ambiguous is backwards compatible. But if the enum is changed anyway, it would be good to have consistent naming. One enum value describes the cause ("gap") and the other describes the outcome ("ambiguous").

The goal is not to dictate what is reasonable, we want to let the caller decide depending on their use case.

Sorry I was cutting short there. Of course the user should be able to fully handle all cases if he intends to (single(), latest, etc). I was more concerned about the unwrap() function remaining, but I will move that discussion to #716. The approx() would be a welcomed additional feature.

The DateTime<Utc> is needed because if T is an Offset, there is nothing to carry the timestamp value.

This is not fully clear to me. Let's assume

let naive_dt = NaiveDatetime::from_ymd(2023,03,12)?.with_hour(2)?.with_minute(30)?;
let local_gap : LocalResult<EST> = est_timezone.from_local_datetime(naive);

let local_dt_with_naive_context = match local_gap {
    ...
    Gap(dt0,dt1,dt2) => yourdesiredoutcome(naive_dt),
};

let local_dt_with_approximate_value = match local_gap {
    ...
    Gap(dt0,dt1,dt2) => dt0,
};

DateTime<Utc> does indeed hold all relevant information, but how is this useful? Since you have to pass naive_dt anyway, you are not gaining new information as long as naive_dt stays in context.

Returning <T,T,T> at least gives the choice to immediately use an "approximate", "before" or "after" value.

UTC wouldn't be the desired time zone anyway, so Gap(NaiveDateTime, T, T) would make more sense if the Gap(T, T, T) approximation is inappropriate.

@Zomtir
Copy link
Contributor

Zomtir commented Feb 20, 2024

Making methods on DateTime<Local> or DateTime<SomeTimeZone> have a different return value from DateTime<FixedOffset> and DateTime<Utc> seems strange, if that is even possible.

Never heard of that either. If LocalResult<T> implements core::result::Result<T, MyError>, this may work. You will definitely be prone to loss of information while downcasting a DateTime<Local> when you didn't intend to.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 20, 2024

Thanks for the link, that was what I was looking for. Fold and Gap are short expressive terms that precisely describe their use case. There is no mention of ambiguous on that page as far as I can see.

There are two terms to describe what happens to the local time: it is folded, or there is a gap. Then there are two terms to describe the result of mapping a local (naive) datetime within a transition to an absolute DateTime: it would be ambiguous during the fold, or missing during a gap.

One enum value describes the cause ("gap") and the other describes the outcome ("ambiguous").

That is an argument. I used InGap (not just gap) instead of Missing or None because the values within the enum variant describe the gap. But I don't mind the name much.

I was more concerned about the unwrap() function remaining

It seems just as useful and controversial as Option::unwrap() 😄.

DateTime<Utc> does indeed hold all relevant information, but how is this useful? Since you have to pass naive_dt anyway, you are not gaining new information as long as naive_dt stays in context.

How about the example in the opening post? I want to know the number of seconds since midnight, but midnight doesn't exist because it falls in a transition gap?

@Zomtir
Copy link
Contributor

Zomtir commented Feb 20, 2024

It seems just as useful and controversial as Option::unwrap() 😄.
Makes sense :)

How about the example in the opening post? I want to know the number of seconds since midnight, but midnight doesn't exist because it falls in a transition gap?

I'm still lost. In the first post you mention that before and after shall map to the same UTC. In the second post, you want the returned UTC to carry the original timestamp value.

23:30 UTC (before/after mapped to UTC) vs 00:00 UTC (naive timestamp)

Edit:

If there is a DST transition from 23:30 to 00:30, is 23:30 even defined? Just like 24:00, the upper bound is usually not included. So <DateTime<UTC>>,T,T would potentially return the same value twice/thrice?

Regarding the minutes since midnight and it is 1:30, I would expect to obtain "+30". Is this correct?

@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 20, 2024

Sorry, I'm not saying things clear enough 😄.

23:30:00+1:00 is the same instant in time as 00:30:00+2:00, and the same as 22:30:00 UTC.

If local time jumps from 23:30:00+1:00 to 00:30:00+2:00 there is a gap of 1 hour in the values. But both the start and the end are exactly at 22:30:00 UTC.

There are multiple ways to deal with missing values because they fall inside a gap in local time. Let's continue with the time 00:00:00 from the example, where we don't know the offset and want to map it to a UTC value.

  • One way to resolve it is to say that if both the start and the end of the gap happen at 22:30:00 UTC, any in-between time also maps to 22:30:00 UTC.
  • Another way you may want to look at it, is that you want to continue using the offset that the local time had before the gap. Then you would subtract the +1:00 offset from the (non-existent) 00:00:00 time, and the time maps to 23:00:00 UTC.
  • You can also pick the offset after the gap, resulting in 22:00:00 UTC.
  • You can of course always treat it as an error if none of these choices would be appropriate.

With the example of the first post, how much time has elapsed since midnight, I would use the first option.

Edit:
With this proposal we would encode the gap (with peseudo-code) as InGap(2024-02-20 22:30:00 UTC, +1:00, +2:00).

@pitdicker
Copy link
Collaborator Author

So <DateTime<UTC>>,T,T would potentially return the same value twice/thrice?

Nearly so.
In the case of TimeZone::from_local_datetime it could return LocalResult::InGap(DateTime<Utc>, DateTime<FixedOffset>, DateTime<FixedOffset>). The first value is a glorified NaiveDate (with Utc as a ZST), and for the other two only the offset differs.
In the case of TimeZone::offset_from_local_datetime it could return LocalResult::InGap(DateTime<Utc>, FixedOffset, FixedOffset), with no duplicated info.

Because the offset type is a generic (it can also be a LocalOffset fox example #750), and we don't know its relation to T, this seemed the easiest solution to me.

@pitdicker
Copy link
Collaborator Author

If there is a DST transition from 23:30 to 00:30, is 23:30 even defined? Just like 24:00, the upper bound is usually not included.

In chrono at least we treat it as defined. Both values are allowed and map to the same time in UTC. With the conversion from UTC to local time, when there is no offset available, I believe we always pick the value after the transition as canonical. 00:30 in this case.

@Zomtir
Copy link
Contributor

Zomtir commented Feb 20, 2024

With this proposal we would encode the gap (with peseudo-code) as InGap(2024-02-20 22:30:00 UTC, +1:00, +2:00).

My head is spinning, but I think i got it now. The second value would be my preferred choice as user. I approve.

I had twisted it the other way around, which of course makes no sense at all:

Wrong signature Gap(00:00 UTC, 22:30 +2:00, 23:30 + 1:00)

@Zomtir
Copy link
Contributor

Zomtir commented Feb 20, 2024

Making methods on DateTime<Local> or DateTime<SomeTimeZone> have a different return value from DateTime<FixedOffset> and DateTime<Utc> seems strange, if that is even possible.

Never heard of that either. If LocalResult<T> implements core::result::Result<T, MyError>, this may work. You will definitely be prone to loss of information while downcasting a DateTime<Local> when you didn't intend to.

core::result::Result cannot be extended, which was to be expected. The closest I came up with was this

#[derive(Debug)]
pub enum CustomError {
    Invalid,
    Fold,
    Gap,
}

#[derive(Debug)]
pub enum CustomResult {
    Single(u8),
    Fold(u8,u8),
    Gap(u8,u8,u8),
    Err(CustomError),
}

impl std::error::Error for CustomError {}

impl std::fmt::Display for CustomError {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        match self {
            CustomError::Fold => write!(f, "Fold error"),
            CustomError::Gap => write!(f, "Gap error"),
            CustomError::Invalid => write!(f, "Invalid error"),
        }
    }
}

impl From<CustomResult> for Result<u8, CustomError> {
    fn from(custom_enum: CustomResult) -> Self {
        match custom_enum {
            CustomResult::Single(x) => Ok(x),
            CustomResult::Fold(_,_) => Err(CustomError::Fold),
            CustomResult::Gap(_,_,_) => Err(CustomError::Gap),
            CustomResult::Err(e) => Err(e),
        }
    }
}

fn test() {
    let test : Result<u8,CustomError> = CustomResult::Fold(5,6).into();
}

The reverse is obviously not possible. If the trait enforces a return type of Result<u8, Error>, we will never get more information out of it. So the trait has to return CustomResult. But it is possible to turn it into() a Result without much boilerplate.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 20, 2024

I am not sure what you are trying to come up with to be honest.

To make sure: it is a design choice to have a custom LocalResult type. It is not a Result in the sense that methods returning it return a value or an error. It is designed to be an enum that allows users to choose how to work with timezone transitions. Its results are usually not an supposed to be an error but a choice, so it intentionally doesn't support the ? operator.

With planned variants for our Error enum we do want to have variants that LocalResult can map to (without its T values). They will be the error returned from its helper methods and various other methods in chrono:

  • LocalResult::Ambiguous maps to Error::NotEnough (there is not enough info to pick between the two datetimes/offsets).
  • LocalResult::InGap maps to Error::DoesNotExist (this time does not exist).

@Zomtir
Copy link
Contributor

Zomtir commented Feb 20, 2024

I am not sure what you are trying to come up with to be honest.

The proposal by djc was to "specialize" the return type for certain implementations. Once the trait returns Result<T>, it is indeed impossible to tag along additional information (aka the full LocalResult). As you correctly mentioned, there is no big harm in returning LocalResult everywhere, even if Single(T) is the only outcome.

Implementing From(LocalResult<T>) for Result<T,E> supports some usecases, but doesn't make the impossible possible.

Edit: Indeed it is the same mapping with Error:NotEnough and Error::DoesNotExist. From was one example of those "various other methods".

@djc
Copy link
Member

djc commented Feb 21, 2024

You can't specialize the return type. You can have a different set of methods.

I'm still struggling with the composition of all of these cases:

  • Single
  • Fold/ambiguous
  • Gap/missing
  • "Orthogonal" errors

I'm inclined to think that it would be better to use a normal Result<LocalTime, Error> where LocalTime represents the single/fold/gap cases (like the current LocalResult), because this fits better with the rest of Rust's error handling. In comparison, colocating an Error variant in the same enum as single/fold/gap variants feels like a less idiomatic setup. I'd like to get away from the LocalResult naming because, for the same reasons, LocalResult to me represents the Ok part of an actual Result, that is, it is the successfull result of a computation that might have a hard-to-interpret value.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 22, 2024

Better name

Shall we try to pick a better name for LocalResult? If I had to give a description of its function:
"The result of looking up a local NaiveDateTime in the given TimeZone. The type of the return value is typically an offset or a DateTime. Because there can be folds and gaps in the local time of a time zone, the local time given as input may be ambiguous or may not exist."

The problem with the current LocalResult name is that we are trained to think of Result as an error carrying type in rust. I don't like the LocalTime name. If you see the name of such a type without knowing about chrono, you would expect it to be a type like DateTime<Local>.

How about TzLookup?

Type of the generic

Interestingly in chrono 4.19 TimeZone::from_local_datetime was the primary method within the TimeZone trait for the Local type, with TimeZone::offset_from_local_datetime calculationg backwards from that.

Now that we have much more complex and correct implementations it is natural to look up the offset for the given local time in a time zone, and TimeZone::from_local_datetime is implemented using TimeZone::offset_from_local_datetime.

So typically we crate a DateTime<Local> by going from LocalResult<FixedOffset> to LocalResult<DateTime<Local>>.

It seems to me figuring out the offset is always the natural first step.

Possible errors

It is still a bit hard to see all the potential errors because we have more than one stategy for dealing with the problem that we can't return any errors today. We currently often return LocalResult::None.

Another interesting case is that currently we silently fall back to assuming UTC if we can't determine the current time zone on Unix.

The following is what I can come up with given our current code. It seems general enough to make sense as a generic error type for all implementers of the TimeZone trait, and can of course be extended.

/// Error type for time zone lookups.
#[non_exhaustive]
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub enum TzLookupError {
    /// Unable to determine the local time zone of the os/platform.
    TimeZoneUnknown,
    /// Error returned by a platform API.
    OsError(i32),
    /// `TZ` environment variable set to an invalid value.
    InvalidTzString,
    /// Unable to locate an IANA time zone database.
    NoTzdb,
    /// The specified time zone is not found (in the database).
    TimeZoneNotFound,
    /// There is an error when reading/validating the time zone data.
    InvalidTimeZoneData,
    /// The result would be out of range.
    OutOfRange,
}

impl From<TzLookupError> for Error {
    fn from(error: TzLookupError) -> Self {
        Error::TzLookupFailure(error)
    }
}

This error type can be wrapped in our Error enum in a new variant, such as Error::TzLookupFailure.

Methods

I'll bite. We currently have five methods on LocalResult:

impl<T> LocalResult<T> {
    /// Returns `Some` only when the conversion result is unique, or `None` otherwise.
    pub fn single(self) -> Option<T>;

    /// Returns `Some` for the earliest possible conversion result, or `None` if none.
    pub fn earliest(self) -> Option<T>;

    /// Returns `Some` for the latest possible conversion result, or `None` if none.
    pub fn latest(self) -> Option<T>;

    /// Maps a `LocalResult<T>` into `LocalResult<U>` with given function.
    pub fn map<U, F: FnMut(T) -> U>(self, mut f: F) -> LocalResult<U>;
}

impl<T: fmt::Debug> LocalResult<T> {
    /// Returns the single unique conversion result, or panics accordingly.
    #[track_caller]
    pub fn unwrap(self) -> T;
}

If the type can carry an error and a way to describe a gap, it could become:

pub enum TzLookup<T> {
    Single(T),
    Ambiguous(T, T), /* first, last */
    InGap(DateTime<Utc>, T, T), /* time of transition, before, after */
    Err(TzLookupError),
}

impl<T> TzLookup<T> {
    /// Returns `Ok` if the time zone lookup has a single result.
    ///
    /// # Errors
    ///
    /// Returns [`Error::Ambiguous`] if the local time falls within fold, and
    /// [`Error::DoesNotExist`] if it falls in a gap in the local time.
    ///
    /// Passes on any error that may have been returned by the type implementing [`TimeZone`].
    pub fn single(self) -> Result<T, Error>;

    /// Returns the earliest possible result of a the time zone lookup.
    ///
    /// # Errors
    ///
    /// Returns [`Error::DoesNotExist`] if it falls in a gap in the local time.
    ///
    /// Passes on any error that may have been returned by the type implementing [`TimeZone`].
    pub fn earliest(self) -> Result<T, Error>;

    /// Returns the latest possible result of a the time zone lookup.
    ///
    /// # Errors
    ///
    /// Returns [`Error::DoesNotExist`] if it falls in a gap in the local time.
    ///
    /// Passes on any error that may have been returned by the type implementing [`TimeZone`].
    pub fn latest(self) -> Result<T, Error>;

    /// Returns the earliest possible result of a the time zone lookup, or the start of a gap if the
    /// local time if it falls within a gap.
    ///
    /// # Errors
    ///
    /// Passes on any error that may have been returned by the type implementing [`TimeZone`].
    pub fn earliest_or_before(self) -> Result<T, Error>;

    /// Returns the latest possible result of a the time zone lookup, or the end of a gap if the
    /// local time if it falls within a gap.
    ///
    /// # Errors
    ///
    /// Passes on any error that may have been returned by the type implementing [`TimeZone`].
    pub fn latest_or_after(self) -> Result<T, Error>;
}

impl<T: fmt::Debug> TzLookup<T> {
    /// Returns a single unique conversion result or panics.
    ///
    /// `unwrap` is best combined with time zone types where the lookup can never fail like [`Utc`]
    /// and [`FixedOffset`]. Note that for [`FixedOffset`] there is a rare case where a resulting
    /// `DateTime` can be out of range.
    ///
    /// # Panics
    ///
    /// Panics if the local time falls within a fold or a gap in the local time, and on any error
    /// that may have been returned by the type implementing [`TimeZone`].
    #[track_caller]
    pub fn unwrap(self) -> T;

    /// Returns a single unique conversion result or panics. Assumes UTC if the type implementing
    /// [`TimeZone`] returns an error.
    ///
    /// # Panics
    ///
    /// Panics if the local time falls within a fold or a gap in the local time.
    #[track_caller]
    pub fn unwrap_or_utc(self) -> T;
}

@pitdicker
Copy link
Collaborator Author

Methods on DateTime currently don't return a LocalResult, but consider any result that falls in a timezone transition as an error. This makes it cumbersome for users to handle these cases correctly. They have to work on a NaiveDateTime and then convert it back with TimeZone::from_local_datetime. Returning a LocalResult would be a big improvement for correctness in my opinion.

I've changed my mind a bit about this. I still believe it would be a big improvement if the return type of many methods on DateTime includes a TzLookup (LocalResult) type. But when the method is fallible for other reasons, such as invalid input, it should return Result<TzLookup>.

@pitdicker
Copy link
Collaborator Author

I'd like to get away from the LocalResult naming because, for the same reasons, LocalResult to me represents the Ok part of an actual Result, that is, it is the successful result of a computation that might have a hard-to-interpret value.

Agreed. To condense the long post above, what about TzLookup?

I'm inclined to think that it would be better to use a normal Result<LocalTime, Error> where LocalTime represents the single/fold/gap cases (like the current LocalResult), because this fits better with the rest of Rust's error handling. In comparison, colocating an Error variant in the same enum as single/fold/gap variants feels like a less idiomatic setup.

Maybe we think about this differently because you view LocalResult more as a type that represents error cases, while I see it more as a natural step between going from a NaiveDateTime to a DateTime?

Or maybe because my idea's heavily leans on the users using the helper methods of LocalResult/TzLookup, which already are going to return a Result?

Including the Error variant to delay its handling to after calling the helper methods doesn't seem less idiomatic to me, while giving better ergonomics.

Also it allows us to provide better helper methods such as unwrap_or_utc() if there is an error, and maybe approx() (which would never fail or panic).

@djc
Copy link
Member

djc commented Feb 22, 2024

TzLookup seems quite indirect -- it doesn't express what the value represents, but instead is related to the process used to derive it.

It is still a bit hard to see all the potential errors because we have more than one stategy for dealing with the problem that we can't return any errors today. We currently often return LocalResult::None.

Another interesting case is that currently we silently fall back to assuming UTC if we can't determine the current time zone on Unix.

Maybe we should dig ourselves out of this first so we have a better sense of how things fit together?

@pitdicker
Copy link
Collaborator Author

Maybe we should dig ourselves out of this first so we have a better sense of how things fit together?

That should not need an awful lot of digging 😆. I'll make a PR in a few days so we have something to shoot at.

@pitdicker
Copy link
Collaborator Author

What would we need to change to propagate the error instead of unwrapping on this line in Local::offset_from_utc_datetime?

Three methods:

  • TimeZone::offset_from_utc_datetime
  • Timezone::from_utc_datetime
  • DateTime::with_timezone

Those methods currently come with the assumption that finding the offset of a UTC datetime in some time zone will never fail.
This is reasonable in a mathematical sense, but ignores that a TimeZone implementation can run into errors, such as our Local type or a more flexible type that uses the OS time zone information.

Within chrono it turns out these methods are used almost always in methods that are already fallible.
Or, for DateTime::with_timezone, there are the infallible DateTime::to_utc and DateTime::fixed_offset methods which cover many uses.

For 0.5 I would just change these methods to return a Result. We could bring fallible alternatives to the 0.4.x branch, but do we want to?

@djc
Copy link
Member

djc commented Mar 1, 2024

For 0.5 I would just change these methods to return a Result. We could bring fallible alternatives to the 0.4.x branch, but do we want to?

I think it might help? We could deprecate the old methods, which might get us feedback sooner about where people run into trouble.

@pitdicker
Copy link
Collaborator Author

How about TzResolution as a new name for LocalResult? Inspired by the DateTimeResolution name of a similar type in eos.

@Zomtir
Copy link
Contributor

Zomtir commented Mar 8, 2024

How about TzResolution as a new name for LocalResult? Inspired by the DateTimeResolution name of a similar type in eos.

Sounds good to me!

@kevincox
Copy link
Contributor

kevincox commented Mar 9, 2024

I'm not sure I understand the structure you have there. Why are all of the _utc timestamps naive?

Also what is transition_utc? Shouldn't the start and end of the gap be identical?

I was more thinking of having two structs. It may be a bit more library code but it would be fairly simple and may be easier for users to understand for the reasons mentioned.

pub enum TzDatetimeResolution<TZ: TimeZone> {
    Single(DateTime<TZ>),
    Ambiguous {
        earliest: DateTime<TZ>,
        latest: DateTime<TZ>,
    },
    InGap {
        /// The instant the gap ends.
        after: DateTime<TZ>,
    },
}

pub enum TzOffsetResolution<T: Offset> {
    Single(T),
    Ambiguous {
        earliest: T,
        latest: T,
    },
    InGap {
        before_offset: T,
        after_offset: T,

        /// The instant the gap ends.
        /// Do we need this? Maybe if they want to know the boundaries of the offset validity there should be a dedicated API for that.
        /// It is a bit weird that we tack this information on for one particular case but not others.
        after_datetime: DateTime<Utc>,
    },
}

But I may be thinking this because my use case is chrono::TimeZone::from_local_datetime so I see that the data needed for the DateTime case is a subset of the offset case so it seems like complexity that is being added on to make one struct fit two roles. Also the helper methods make less sense in this case because what does earliest_or_before mean? How much before. A time gap is instantaneous so there isn't really a before and after.

@pitdicker
Copy link
Collaborator Author

Why are all of the _utc timestamps naive?

Oops, those should be local values.

Also what is transition_utc? Shouldn't the start and end of the gap be identical?

Have you seen the four ways in which you may want to handle a gap that I described in #1448 (comment)?
To do those we not only need the input datetime but also need to know the datetime of the gap.

@pitdicker
Copy link
Collaborator Author

I am not thinking entirely clearly this evening... Ignore what I said about the map method; it is probably wrong.

@kevincox
Copy link
Contributor

kevincox commented Mar 9, 2024

Can you provide an example when you would want to do the middle two options? Continue the offset on either side of the gap? It seems that this would lead to very strange behaviour and I can't think of a use case. Notably the result of your calculation would be erratic based on small time shifts.

Local Time Earlier offset UTC Later offset UTC True UTC
23:29 22:29 22:29 22:29
23:30 22:30 21:30 (jump back) 22:30
00:29 23:29 (future) 22:29 22:30
00:30 22:30 (jump back) 22:30 22:30

It seems like we are trying to combine "offset at local time" and "UTC at local time" and ending up with a more complex solution than we need. But I'm trying to make sure that I understand all of the questions that we are trying to answer with these APIs.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Mar 9, 2024

Can you provide an example when you would want to do the middle two options? Continue the offset on either side of the gap?

No. Option one and four make most sense to my personally. The middle options are what libc, Windows and javascript do though. Even if I can't see it directly, they must have some use cases or not be entirely unreasonable. Making them possible would at least be helpful for compatibility reasons.

For the rest we seem to have a misunderstanding that I can't put my finger on. Is it okay to pause this discussion for now?
The main point of this issue was splitting the None variant in one that has more info about the gap, and one that can encode errors. Shall I cc you once there is some code to shoot at?

@Zomtir
Copy link
Contributor

Zomtir commented Mar 10, 2024

Can you provide an example when you would want to do the middle two options? Continue the offset on either side of the gap?
But I'm trying to make sure that I understand all of the questions that we are trying to answer with these APIs.

If I understood the topic correctly, this is an example where I want the offset before the DST to continue after the gap.

I have two NaiveTime input fields for allowing a user to create an event with a begin and end time. He will put "1:30" as begin time and "2:30" as end time. The chosen date has a DST from 2:00 to 3:00. It is fair to assume that he wants his event to last 1h.

Of course the correct thing is to directly notify him that the LocalTime for the EventEnd does not exist. Or the input fields require an UTC offset as well. But both seem very user unfriendly.

NaiveTime LocalTime (Before) LocalTime (After) UTC Time
Event Begin 1:30 1:30 (+01:00) 2:30 (+02:00) 0:30
Event End 2:30 2:30 (+01:00) 3:30 (+02:00) 1:30
DST Begin 2:00 2:00 (+01:00) 3:00 (+02:00) 1:00
DST End 3:00 3:00 (+01:00) 3:00 (+02:00) 1:00

As developer I would just take his NaiveTimes, calculate the TimeDelta, assume the first DST offset (-01:00) and add the TimeDelta to the LocalTime.

This also works if both times are inside the DST gap.

NaiveTime LocalTime (Before) LocalTime (After) UTC Time
Event Begin 2:10 2:10 (+01:00) 3:10 (+02:00) 1:10
Event End 2:20 2:20 (+01:00) 3:20 (+02:00) 1:20
DST Begin 2:00 2:00 (+01:00) 3:00 (+02:00) 1:00
DST End 3:00 3:00 (+01:00) 3:00 (+02:00) 1:00

Because the choice of picking the DST offset is arbitrary and up to the developer, in cases where the duration between two times matters more than the "correct" time, it is important to stay on the same offset despite a DST transition happening.

@pitdicker
Copy link
Collaborator Author

Interesting example! If a use case can assume it is more important to preserve the duration than the exact end time, you would get more consistent results by coding it with a duration:

fn local_start_end(start: NaiveDateTime, end: NaiveDateTime) -> (DateTime<Local>, DateTime<Local>) {
    let duration = end.duration_since(start).unwrap(); // may fail if end < start
    let start = Local.from_local_datetime(&start).earliest().unwrap(); // may fail if inside a gap
    let end = start + duration; // no problems getting an end time with offset
    (start, end)
}

Or you could choose to only do this fallback in case the end time does not exist.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Mar 10, 2024

Okay. New and complete proposal. I think I now better get @kevincox's concerns.

Base ideas

  • Rename LocalResult to TzResolution to avoid the connotations of *Result.
  • Remove TimeZone::offset_from_* methods. An offset can easily be derived from a DateTime with .offset().
  • Make TzResolution only work on DateTimes and only generic over TimeZone. This should make it easier to reason about.
  • Split the None variant into InGap and Error. This allows us to properly report errors.
  • Encode the start and end of a gap. This allows us to properly handle the current None case.
  • Don't combine the input value with an offset in InGap. Even though other API's do strange thing with a gap, it is okay if we only make easy what makes mathematical sense. With the original input and the returned offsets a user can still create DateTimes however he wants.
  • The error type TzError is shared across all implementations of TimeZone. I am still quite pleased with the variants from 0.5.x: Timezone conversion result (LocalResult) #1448 (comment), and have an implementation that does not blow up the size of chrono::Error.

API

pub trait TimeZone {
    type Offset: Offset;

    fn from_utc_datetime(&self, utc: &NaiveDateTime) -> Result<DateTime<Self>, TzError>;
    fn from_local_datetime(&self, local: &NaiveDateTime) -> TzResolution<Self>;

    /* with_ymd_and_hms, with_timestamp* */
}

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum TzResolution<Tz: TimeZone> {
    Single(DateTime<Tz>),
    Ambiguous {
        first: DateTime<Tz>,
        last: DateTime<Tz>,
    },
    InGap {
        gap_start: DateTime<Tz>,
        gap_end: DateTime<Tz>,
    },
    Err(TzError),
}

impl<Tz> TzResolution<Tz> {
    /// Returns `Ok` if the time zone lookup has a single result.
    pub fn single(self) -> Result<DateTime<Tz>, Error>;

    /// Returns the earliest possible result.
    pub fn earliest(self) -> Result<DateTime<Tz>, Error>;

    /// Returns the latest possible result.
    pub fn latest(self) -> Result<DateTime<Tz>, Error>;

    /// Returns the earliest possible result or the datetime of the gap.
    pub fn earliest_or_gap(self) -> Result<DateTime<Tz>, Error>;

    /// Returns the latest possible result or the datetime of the gap.
    pub fn latest_or_gap(self) -> Result<DateTime<Tz>, Error>;
}

impl<Tz: fmt::Debug> TzResolution<Tz> {
    /// Returns a single unique conversion result or panics.
    #[track_caller]
    pub fn unwrap(self) -> DateTime<Tz>;
}

/// Error type for time zone lookups.
#[non_exhaustive]
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub enum TzError {
    /// Unable to determine the local time zone of the os/platform.
    TimeZoneUnknown,
    /// Error returned by a platform API.
    OsError(OsError),
    /// `TZ` environment variable set to an invalid value.
    InvalidTzString,
    /// Unable to locate an IANA time zone database.
    NoTzdb,
    /// The specified time zone is not found (in the database).
    TimeZoneNotFound,
    /// There is an error when reading/validating the time zone data.
    InvalidTimeZoneData,
    /// The result would be out of range.
    OutOfRange,
}

#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub struct OsError(/* ...  */);

@kevincox
Copy link
Contributor

Re #1448 (comment): Yeah, that use case is good but also a bit strange. Because if I have an event that ends in a gap I probably want to keep the duration, but if it ends after the gap then you probably want to keep the end time? I think a perfect UI would at least inform the user about the decision that is being made here. I think it is worth trying to make that case very explicit, but it is also nice to make it easy. I think the API proposed should make it easy enough to apply the duration difference to the converted start time as a fallback for landing in a gap.

That proposal looks great to me. I only have one question.

What is TzResolution::InGap#gap_start? If it has the exact same Tz as gap_end then wouldn't it be an invalid time? Or does it somehow have the same Tz but a different offset? Or is it 1s or 1ns before the gap?

@TCeason
Copy link

TCeason commented Jun 24, 2024

Interesting issue. How can the new design deal with a time that does not exist in DST?

e.g.

In timezone 'Asia/Shanghai', time is '1986-05-04 02:01:00'

Will retrun error:

Timezone conversion resulted in no valid time for tz: `Asia/Shanghai`, 

In PG:

postgres=# select current_setting('timezone');
 current_setting 
-----------------
 Asia/Shanghai
(1 row)

postgres=# select to_timestamp('1986-05-04 02:01:00', 'YYYY-MM-DD hh24:mi:ss');
      to_timestamp      
------------------------
 1986-05-04 03:01:00+09
 

@djc
Copy link
Member

djc commented Jun 24, 2024

@TCeason what timezone transition happens in Shanghai in May of 1986? By "no valid time", which of the LocalResult variants do you mean?

@TCeason
Copy link

TCeason commented Jun 24, 2024

@TCeason what timezone transition happens in Shanghai in May of 1986? By "no valid time", which of the LocalResult variants do you mean?

LocalResult::None.

Because of dst in Asia/Shanghai 1986-05-14 02:00:00 is not exists

@djc
Copy link
Member

djc commented Jun 24, 2024

Well, arguably Postgres is wrong and chrono is being more careful/explicit.

@TCeason
Copy link

TCeason commented Jun 24, 2024

Well, arguably Postgres is wrong and chrono is being more careful/explicit.

I'm not sure. Shanghai is +08. PG add one hour and display this value with +09

@djc
Copy link
Member

djc commented Jun 24, 2024

Well, arguably Postgres is wrong and chrono is being more careful/explicit.

I'm not sure. Shanghai is +08. PG add one hour and display this value with +09

No -- due to the DST transition, times between 02:00 and 03:00 did not exist on May 4th 1986.

@TCeason
Copy link

TCeason commented Jun 24, 2024

Well, arguably Postgres is wrong and chrono is being more careful/explicit.

I'm not sure. Shanghai is +08. PG add one hour and display this value with +09

No -- due to the DST transition, times between 02:00 and 03:00 did not exist on May 4th 1986.

Yes, I think that's reasonable. However, in real business scenarios, there may be cases where string is converted to ts. If it's LoadResult::None, How can I try to do a proper conversion for that ts? Convert to the right value?

E.g. in clickhouse give a value that delete a hour.

SELECT toDateTime('1986-05-04 02:01:00');

SELECT toDateTime('1986-05-04 02:01:00')

Query id: 93c03d26-c2ab-433f-a1e1-8cb6244c489b

┌─toDateTime('1986-05-04 02:01:00')─┐

  1. │ 1986-05-04 01:01:00 │
    └───────────────────────────────────┘

@pitdicker
Copy link
Collaborator Author

Don't you have any way to catch invalid values before this point?

@TCeason
Copy link

TCeason commented Jun 24, 2024

Don't you have any way to catch invalid values before this point?

It's hard. Because varchar may have different formats.

Now I try to add a match arm when matching None does some process. But not only this case return None.

The local time does not exist because there is a gap in the local time.
This variant may also be returned if there was an error while resolving the local time,
caused by for example missing time zone data files, an error in an OS API, or overflow.

@dennsva
Copy link

dennsva commented Jul 5, 2024

I would need this feature to be able to order two claimed local times, i.e. two instances of LocalResult. I can implement my chosen behaviour in the Ambiguous case, but the None case currently makes this impossible. Ordering is of course possible through NaiveDateTime if the timezone of the instances is equal, but that is not necessarily the case for me.

I read through the discussion and have a few opinions to add.

  • Above all, i think the returned type should not contain an error variant. The possibility for an error is best represented as Result<TzResolution<impl TimeZone>, impl Error>.
  • I prefer the names Ambiguous and Missing. In any case, I think changing the name Ambiguous is unnecessary, and if keeping that, we should definitely use the corresponding term Missing for gaps.
  • A gap naturally corresponds to exactly one instant in time. Whether it is represented with the offset before or after the gap is irrelevant (but chrono seems to use the latter).
  • I think the length of the gap is out of scope for this enum. It is often not needed and can be calculated by comparing the offset from the DateTime given by missing with the offset from the instant just before (my understanding is that the minimum representable difference is well-defined, a nanosecond).

Here is the API I'd like to see:

pub enum TzResolution<Tz: TimeZone> {
    Single(DateTime<Tz>),
    Ambiguous(
        // Earliest
        DateTime<Tz>,
        // Latest
        DateTime<Tz>,
    ),
    Missing(
        // The instant the gap starts and ends.
        DateTime<Tz>,
    ),
}

trait TimeZone {
    // Panic if there was an error while resolving the local time,
    // caused by for example missing time zone data files,
    // an error in an OS API, or overflow.
    fn offset_from_local_datetime(&self, local: &NaiveDateTime) -> TzResolution<Self>;

    // With result instead of panic
    fn try_offset_from_local_datetime(
        &self,
        local: &NaiveDateTime,
    ) -> Result<TzResolution<Self>, impl Error>;
}

@ariebovenberg
Copy link

ariebovenberg commented Jul 5, 2024

Chiming in here as I’m very interested in being able to more gracefully handle gaps.

Did you have a look at what Temporal (the official new datetime API for JS) is doing in this regard? Its very close to the API you suggest here.
However, the big difference here is that Temporal ‘extrapolates’ datetimes in gaps, instead of truncating to the start or end of the gap. There is a discussion here on why they consider this the best option.

Other reasons for choosing the 'extrapolation' approach:

  • It corresponds to the likely reason for the missing date time: i.e. we forgot to adjust the clock, or adjusted it too early
  • It corresponds to how most other datetime libraries do it (NodaTime, java.time, and Python to name a few)
  • It corresponds with the iCalendar (RFC5545) standard of handling gaps
  • Extrapolation keeps the most salient information about the time (i.e. 2:15am becoming 3:15am keeps the fact that it’s a ‘quarter past’ the hour. Truncating all 2:xx times to 1:59am loses more information IMHO)

The figure in the Python docs here also shows how this ‘extrapolation’ makes sense graphically.

edit: add missing link, spelling

@dennsva
Copy link

dennsva commented Jul 5, 2024

I understand the convenience in extrapolation like that, could that be a separate function? My function name is probably not the best, but:

fn offset_from_local_datetime_rfc5545(&self, local: &NaiveDateTime) -> DateTime<impl TimeZone>;

This function signature is more convenient for those who don't want custom handling for folds and gaps, but also having access to something like TzResolution makes it possible to do something with the timestamp (or two) corresponding to when the (perhaps imaginary) given NaiveDateTime is passed. This information is not available if only the extrapolation is given.

@cynecx
Copy link

cynecx commented Oct 22, 2024

Just a JFYI: https://docs.rs/jiff/latest/jiff/tz/struct.TimeZone.html#method.to_ambiguous_zoned

(I've been using the jiff crate with great success and their API to disambiguate datetime conversion is also very user-friendly imo.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants