Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add ToDateTimeOffset and TryToDateTimeOffset to System.Guid #107136

Closed
MagnusSandgren opened this issue Aug 29, 2024 · 13 comments
Closed
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@MagnusSandgren
Copy link

Background and motivation

Dotnet 9 preview adds support for Guid.CreateVersion7() and Guid.CreateVersion7(DateTimeOffset timestamp). Alongside the new static methods comes the properties Version and Variant to further support the concept of distinct types of uuids represented by the Guid.

While there exists a way to create a Guid on version 7 and passing down a DateTimeOffset timestamp, there is no native way to retrieve the timestamp from the Guid.

Why would you need it?

One example is as follows. Say you have a web API that allows for user defined identifiers, but only on uuid v7 format within a timeframe. Say this id is used as a primary key in a relational database (postgres). I would argue that not validating the timestamp defeats the purpose as users could create uuids that would massively impact the clustering of the primary key index. The Version and Variant is enough to validate the first part. However, to validate the period one would need to extract the timestamp.

API Proposal

// Add the following to `System.Guid`:
public DateTimeOffset ToDateTimeOffset()
public bool TryToDateTimeOffset(out DateTImeOffset timestamp)

API Usage

var guid7 = Guid.CreateVersion7(DateTimeOffset.Parse("2024-01-01T12:00:00.0000000+00:00"));
var timestamp = guid7.ToDateTimeOffset();
Console.WriteLine(timestamp.ToString("O")); // Output: 2024-01-01T12:00:00.0000000+00:00

Alternative Designs

No response

Risks

No response

@MagnusSandgren MagnusSandgren added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 29, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-datetime
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@julealgon
Copy link

Isn't the date used only as part of the Guid data? I think calling a method ToDateTime is kinda misleading in that way.

Why not just expose a nullable property on the struct?

var timestamp = guid7.Timestamp;

@MagnusSandgren
Copy link
Author

Thanks for the feedback @julealgon

Isn't the date used only as part of the Guid data? I think calling a method ToDateTime is kinda misleading in that way.

I see what you mean. The new properties makes more sense because, as far as I understand, version and variant are part of the fundamental uuid standard. I may be mistaken.

However the timestamp in the 6 first bytes (with big-endian ordering) is only valid for v7. But the Guid type is not version specific. And so in that way both the methods I suggested, as well as the property you suggested, is a little misleading. Something specific on something generic.

I would still argue the benefit of having some way to retrieve the timestamp from the v7 Guid as part of dotnet for such things as validation purposes. I suspect it will be a highly copied extension method otherwise. And it would encapsulate the fact that the underlying byte array in the Guid is on little-endian format, which is confusing to say the least.

It makes more sense to me to have it as methods instead as a property, because of the "something specific on something generic"-argument. If the Guid is a well formatted v7 uuid both methods work as expected. However if it's not, the try variant won't throw an exception and the other will.

Maybe the methods should have more explicit names? Maybe it is enough to be specific in the xml docs?

@tannergooding
Copy link
Member

tannergooding commented Aug 29, 2024

Given an arbitrary GUID, even one that is Version == 7 and Variant == 0b10xx, it is impossible to fully extract the timestamp due to there being optional information that is scenario/domain specific.

That is, per the RFC we have (roughly):

    // v7
    //     48-bit timestamp millisecond ticks since Unix Epoch
    //     12-bits of random data -or- submillisecond timestamp (M3)
    //         M3 - Scale remaining precision to fit into the 12-bits at even intervals
    //     62-bits of random data -or- carefully seeded counter (M1 or M2)
    //         M1 - Dedicated counter, simply increment per UUID created within a given tick stamp
    //         M2 - Monotonic random, seed random then increment by random amount within a given tick stamp

Thus, while you have 48 guaranteed bits of precision (assuming the GUID is actually following the RFC and not a truly random assortment of bits, which is also a fairly common usage), you have no way of discerning how many of the 74 additional bits are being used to also represent a portion of the timestamp (if any at all).

This rather seems like the type of thing where if you are wanting to extract the data back out of a Guid, then you are better off using Guid.TryWriteBytes with a 16-byte span (likely specifying bigEndian: true) and extracting the data that is appropriate for your specific scenario. It's not a very complex API to write and avoids any confusion or possible error due to scenario or domain specific nuance or differences.

@Clockwork-Muse
Copy link
Contributor

Wikipedia actually lists the V7 timestamp portion as:

UUIDv7 begins with a 48 bit big-endian Unix Epoch timestamp with approximately millisecond granularity. The timestamp can be shifted by any time shift value.

.... which means I see all sorts of fun implementations in the future....

@MagnusSandgren
Copy link
Author

MagnusSandgren commented Aug 30, 2024

So, attempting to boil the arguments against this API proposal thus far down to a one-liner:

ToDateTimeOffset and TryToDateTimeOffset on System.Guid does not make sense in the runtime because the uuidv7 specification is too lenient/flexible.

Is that about right? If so - does not the same hold true for the new Guid.CreateVersion7() and Guid.CreateVersion7(DateTImeOffset timestamp) methods? This lead me down a rabbit hole where i discovered a comment from you @tannergooding:

... room to expose additional Guid CreateVersion7(...) overloads ...

I would argue there is room to expose additional overloads of the proposed methods to do the same thing as you suggested for CreateVersion7. Or am I mistaken? Also - the only data needed to fill the DateTimeOffset with submillisecond-resolution is M3, right? Which I guess you would have to know the scale of for such overloads 🤔

@tannergooding
Copy link
Member

tannergooding commented Aug 30, 2024

The consideration is that APIs for creating a Guid are easy to rationalize because they are the point of creation and therefore there can be no confusion about what the fields mean.

On the other hand, there is no trivial way to then recover that information back out. At best we're providing APIs that are expecting the user to pass in correct values that track how the state is represented. That is, we'd have to provide something like static DateTimeOffset ExtractVersion7DateTimeOffset(Guid value, int submillisecondBitPrecision) where submillisecondBitPrecision is [0, 12] (if we're going strictly with what the RFC allows, and not covering customizations apps may have opted to do themselves). This can get tricky and cause various issues. You may then want to also extract the counter if one exists, which requires an additional parameter saying how many of the remaining bits are used as a counter, which cannot necessarily be accurately extracted given a seeded counter.

At that point, it almost becomes easier for the domains that do need to extract the DateTimeOffset (and which for whatever reason cannot just compare the Guids directlry) to build their own extract method that understands the exact details of the Guids they're dealing with, including any customizations or deviations if they opted to do something unique for their scenario. This is trivially possible to build today using ``TryWriteBytes` and should never be more than a handful of lines of code.

@MagnusSandgren
Copy link
Author

Thanks for the quick response! You bring some sound arguments to the table @tannergooding 😅 Although some kind of helper methods to extract the DateTImeOffset from the Guid would be helpful, the leeway in the uuidv7 standard is too broad to be able to do so in any meaningful way from the dotnet runtime.

I will let the issue remain open should the community have some input. However, feel free to close it.

@jeffhandley jeffhandley added this to the Future milestone Sep 17, 2024
@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Sep 17, 2024
@am11
Copy link
Member

am11 commented Sep 23, 2024

Meanwhile this extension method will do the job:

public unsafe static DateTimeOffset ParseDateTimeFromGuidv7(this Guid guid)
{
    // Extract the first 4 bytes (_a field) as an int.
    int a = *(int*)&guid; // Read the first 4 bytes

    // Extract the next 2 bytes (_b field) as a short.
    short b = *(short*)((byte*)&guid + sizeof(int)); // Read the next 2 bytes

    // Combine the values to get the full 48-bit timestamp in milliseconds.
    long unixTimestampMs = ((long)a << 16) | (ushort)b;

    // Convert the Unix timestamp (milliseconds) back to DateTimeOffset.
    return DateTimeOffset.FromUnixTimeMilliseconds(unixTimestampMs);
}

// alternative
public static DateTimeOffset ParseDateTimeFromGuidv7_UsingVector128(this Guid guid)
{
    // Read the GUID into a Vector128<long>
    Vector128<long> vector = Unsafe.ReadUnaligned<Vector128<long>>(
        ref Unsafe.As<Guid, byte>(ref Unsafe.AsRef(in guid)));

    // Extract the first 64 bits (two longs)
    long firstLong = vector.GetElement(0);

    // Directly combine the 32 bits (high) and 16 bits (low) into a 48-bit timestamp
    long unixTimestampMs = ((firstLong & 0xFFFFFFFF) << 16) | ((firstLong >> 32) & 0xFFFF);

    // Convert to DateTimeOffset
    return DateTimeOffset.FromUnixTimeMilliseconds(unixTimestampMs);
}

// usage:
// DateTimeOffset dt = DateTimeOffset.UtcNow;
// Guid guid = Guid.CreateVersion7(dt);
// Console.WriteLine(dt.ToUnixTimeMilliseconds() == guid.ParseDateTimeFromGuidv7().ToUnixTimeMilliseconds());
// Console.WriteLine(dt.ToUnixTimeMilliseconds() == guid.ParseDateTimeFromGuidv7_UsingVector128().ToUnixTimeMilliseconds());

(I was trying to optimize ParseDateTimeFromGuidv7_UsingVector128 since it is not generating any better code as I was hoping for: https://godbolt.org/z/drhKqoEbe)

Note that the calculation of GUDI v7 skips the first 16 bits of since-epoch timestamp milliseconds. Which is not a problem because the first time 16 bit is set is Tue Aug 02 10889 05:31:50.656 (281474976710656) and FromMilliseconds starts throwing ArgumentOutOfRangeException after Fri Dec 31 9999 23:59:59.999 (253402300799999).

cc @gatapia per #103658 (comment)

@tannergooding
Copy link
Member

FYI you can simplify the Vector method quite a bit just by casting it straight to a Vector128:

This isn't safe and can cause undefined behavior on platforms due to a mismatch between the natural alignments of the types. Vector128<T> is likewise not guaranteed to be accelerated and in general shouldn't be used unless Vector128.IsHardwareAccelerated returns true. If it returns false, then an alternative fallback implementation should be used instead.

Unsafe.ReadUnaligned, Vector128.Load, or Vector128.LoadUnsafe will do the right things regardless of alignment. Using Unsafe.BitCast will also do the right thing, since its explicitly aware of the potential for data mismatch and will fallback to doing what is effectively Unsafe.ReadUnaligned in a scenario it would be required.

but this should suffice as a current workaround (and doesn't require /unsafe).

Not requiring unsafe is not a benefit and is often quite the opposite. There are many forms of unsafety and /unsafe (and the general unsafe { } language feature) only flags those involving direct pointer usage today.

APIs involving Unsafe, Marshal, generally living under the System.Runtime.CompilerServices or System.Runtime.InteropServices namespaces, among other considerations can all be unsafe, dangerous, or problematic as well. In many cases these APIs can be more dangerous because they don't require contextual unsafe { } usage and so aren't as visibly dangerous, often leading to the code getting less scrutiny. Even beyond the more "obvious" examples of Unsafe there are other types of unsafe as well, including APIs like ArrayPool which can be dangerous due to "use after free" bugs (you can return an array and still use it after; while something else that rents can then get that same array back, for example) or due to memory not getting zeroed (which is a user option).

Use of such APIs can also cause issues like GC holes or other unintuitive/dangerous problems and there is a general proposal that some kind of [UnsafeCallersOnly] attribute exists such that we can enforce that calling these types of APIs requires the usage of unsafe { } to help highlight and surface such issues.

@Frostchi
Copy link

Wonderful explanation! Thanks 😄

@tannergooding
Copy link
Member

Closing this as per the comments above. This isn't something that can be trivially exposed and is fairly domain/scenario specific due to the way the format is actually defined.

@tannergooding tannergooding closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

8 participants