Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: RuntimeHelpers.GetRawData #28001

Open
MichalStrehovsky opened this issue Nov 27, 2018 · 30 comments
Open

Proposal: RuntimeHelpers.GetRawData #28001

MichalStrehovsky opened this issue Nov 27, 2018 · 30 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection
Milestone

Comments

@MichalStrehovsky
Copy link
Member

Proposal

I suggest we add the following method to System.Runtime.CompilerServices.RuntimeHelpers:

namespace System.Runtime.CompilerServices
{
    public static class RuntimeHelpers
    {
        public static ref byte GetRawData(object o);
    }
}

Given an object instance o, this would return a managed reference to the first field of the type (or first element of an array if o is an array, or first character of a string if o is string). This can be implemented the same way JitHelpers.GetPinningHelper is implemented within the CLR and would have similar use.

Alternatives considered

We could place the helper in System.Runtime.CompilerServices.Unsafe, but that would be troublesome for variable-length types (arrays and string). If we ever add new ones (such as the Utf8string prototype), the API might not do what the user expects. This API needs to be versioned with the underlying runtime.

Example Usage

Create a strongly typed reflection-free field getter and setter:

abstract class FastFieldAccessor
{
    public abstract object GetValue(object o);
    public abstract void SetValue(object o);
}

class FastFieldAccessor<T> : FastFieldAccessor
{
    private IntPtr _offset;

    public FastFieldAccessor(object o, ref T field)
    {
        _offset = Unsafe.ByteOffset(ref RuntimeHelpers.GetRawData(o), ref Unsafe.As<T, byte>(ref field));
    }

    public override object GetValue(object o)
    {
        return Unsafe.As<byte, T>(ref Unsafe.Add(ref RuntimeHelpers.GetRawData(o), _offset));
    }

    public override void SetValue(object o)
    {
        Unsafe.As<byte, T>(ref Unsafe.Add(ref RuntimeHelpers.GetRawData(o), _offset)) = (T)o;
    }
}
@xoofx
Copy link
Member

xoofx commented Nov 27, 2018

Would love to have this actually. Typically, the fact that we don't have access today to JitHelpers.GetRawSzArrayData(this Array array) is annoying because there are many optimizations that CoreCLR can do internally on arrays that we can't do on our own.....

@MichalStrehovsky, what would be the story to calculate the offset field? I'm not a fan of the convoluted code we have to go through today like this:

    class MyClass
    {
        public string field;
        public int field2;
    }

    class MyClassOffset
    {
        private static readonly MyClass Temp = new MyClass(); 
        public static readonly int OffsetField2 = Unsafe.ByteOffset(ref Unsafe.As<string, int>(ref Temp.field), ref Temp.field2).ToInt32();
    }

Wondering if as part of the CLR change required for interface static methods, if we should not introduce more IL opcodes for these kind of scenarios... 🤔

@xoofx
Copy link
Member

xoofx commented Nov 27, 2018

Wondering if as part of the CLR change required for interface static methods, if we should not introduce more IL opcodes for these kind of scenarios...

Thinking more seriously about this, I'm worried that we could miss an opportunity here (we don't have the occasion to have breaking changes to the IL), so should we think more broadly about what would be relevant to bring to these CLR changes? @jkotas what do you think?

@jkotas
Copy link
Member

jkotas commented Nov 27, 2018

I think that GetRawData(object o) is too low-level as public API with unclear semantics around arrays, etc. I would be ok with ref char GetRawData(string s) or ref T GetRawData<T>(T[] s).

The fast field accessors that are shown in example usage should be first class public APIs exposed by the runtime. I believe we have some related issues about this. I do not think we want everybody to be implementing their own version of them from low-level building blocks. We want them to be owned by the runtime so that we can optimize the heck out of them, without compromising type safety.

breaking changes to the IL

I do not think this proposal requires any changes to the IL. Also, note that interface static methods are not a breaking change. It would be a breaking change if existing programs stopped working which is not the case. The fact that new features require new runtimes is not a breaking change.

@xoofx
Copy link
Member

xoofx commented Nov 27, 2018

I do not think this proposal requires any changes to the IL.

Yeah, I was not asking about this issue in particular. For example, calculating the offset of a field from a root object is "quite common" for low level access so a fld.offset opcode would be actually useful, while today we have to go through lots of convoluted unsafe API to get something like this....

@jkotas
Copy link
Member

jkotas commented Nov 27, 2018

For example, calculating the offset of a field from a root object is quite common

What are the scenarios you need the offsets for? I would rather look at what you are trying to do and see what is the best way to address it. Offsets are way too low-level.

@xoofx
Copy link
Member

xoofx commented Nov 27, 2018

What are the scenarios you need the offsets for?

For example: an animation engine, instead of having full codegen for accessing fields of structured data (which incurs an indirect call and hurts instruction cache in the end with too many of them) you can use instead field offsets to manipulate the data efficiently. Today, we need to use the code I showed above to calculate the offset, which incurs to allocate a temporary object to calculate this offset and calling multiple methods from Unsafe, while this offset is accessible from the JIT/AOT compiler relatively easily. In C# this could translate to something like offsetof(MyStruct.field2) with a direct opcode for it...

@jkotas
Copy link
Member

jkotas commented Nov 28, 2018

an animation engine, instead of having full codegen for accessing fields of structured data

Would the type safe field accessors work for this? I would like to have a type safe default that would be used in most cases, and have unsafe escape hatch for cases where the type safety check is too expensive.

@xoofx
Copy link
Member

xoofx commented Nov 28, 2018

Would the type safe field accessors work for this?

Not sure I know this, do you have an example how does it work?

@jkotas
Copy link
Member

jkotas commented Nov 28, 2018

E.g. We can introduce a fully type safe field accessor like this:

struct FieldAccessor<TType, TField> where TType : class
{
    int _offset;

    public ref TField GetValue(TType obj)
    {
        return ref *(obj + _offset); // pseudo code 
    }
}

@xoofx
Copy link
Member

xoofx commented Nov 28, 2018

E.g. We can introduce a fully type safe field accessor like this:

Oh right, this is what I would use in the end once I have the offset.

My concern is more for the calculation of _offset. Today, we don't have an IL instruction for calculating it. We have to go through something like Unsafe.ByteOffset(ref Unsafe.As<string, int>(ref Temp.field), ref Temp.field2).ToInt32(); (and create a temp instance of MyStruct for example) while I'm looking for something like offsetof(MyStruct.field2)

@jkotas
Copy link
Member

jkotas commented Nov 28, 2018

There would be a factory method to get the FieldAccessor from the FieldInfo or RuntimeFieldHandle. The factory would be a slower method - it would validate the types.

@xoofx
Copy link
Member

xoofx commented Nov 28, 2018

There would be a factory method to get the FieldAccessor from the FieldInfo or RuntimeFieldHandle. The factory would be a slower method - it would validate the types.

As long as we can avoid using reflection (and allocating lots of managed memory) to fetch this token, I'm fine with this approach. I assume that we would have proper support in Roslyn to do a ldtoken via something like handleof(xxx) as described in the intrinsics support proposal?

@jkotas
Copy link
Member

jkotas commented Nov 28, 2018

Right.

@MichalStrehovsky
Copy link
Member Author

I assume that we would have proper support in Roslyn to do a ldtoken

And for a cross-cutting feature like this, we can look forward to having it available in .NET Core 5 or 6 :).

I'm proposing to expose a low level building block like this to address the needs people have right now. It's similar to the function pointer support that C# is adding - they could have waited for static delegates, but they want to address problems people are having now, as opposed to selling them on how great things will be 3 years from now.

ML.net is using Reflection.Emit because of what appears to be a lack of building blocks: dotnet/machinelearning#1736

@xoofx
Copy link
Member

xoofx commented Nov 28, 2018

And for a cross-cutting feature like this, we can look forward to having it available in .NET Core 5 or 6 :).

But I thought that handleof is so simple (and similar to nameof) that it should not be much trouble to bring this to Roslyn. What is the alternative if we don't have support baked into Roslyn? (considering Reflection.Emit is a no go)

@jkotas
Copy link
Member

jkotas commented Nov 28, 2018

I'm proposing to expose a low level building block like this to address the needs people have right now

So what would ML.NET do with the low-level GetRawData(object o) ? They would still need to use hacky unsafe code to compute the field offset or hardcode it. If they were ok with doing that, they can also write a 5 lines more of it and add their own our hacky local implementation of GetRawData(object o) too. We do not need to do anything. They have all the low-level unsafe tools they need.

We would not need to wait for handleof to expose the type-safe field accessory. Getting the reflection object to create the type-safe field accessor from would not be as efficient or straightforward as handleof, but that's fine. It would still be a lot more efficient and straightforward compared to Reflection.Emit solutions frequently used to solve this problem today.

they could have waited for static delegates

Not really. It would be ugly for static delegates to solve all the problems (e.g. around interop) that the function pointers are solving.

@GrabYourPitchforks
Copy link
Member

I opened https://github.com/dotnet/corefx/issues/36133 to track the SzArray-specific helper method.

@mjsabby
Copy link
Contributor

mjsabby commented Jul 16, 2019

@MichalStrehovsky We can get this in S.P.C first right? This and also GetObjectSize and HasReferences.

@MichalStrehovsky
Copy link
Member Author

@MichalStrehovsky We can get this in S.P.C first right? This and also GetObjectSize and HasReferences.

Yes, exposing this as a private API on RuntimeHelpers for frozen objects serializer should work. It feels like a better place for it than JitHelpers.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@Sergio0694
Copy link
Contributor

Following up this conversation with @MichalStrehovsky, it'd be nice if we also had an API just like GetArrayDataReference, but for 2D arrays. I mean, ideally it'd be nice to have a built-in way to do that for ND arrays in general, but having that API for at least 2D arrays would be very useful 👍

I know multidimensional arrays in general are not used that often, but I do think they could use some more love from the new APIs (Span<T> and Memory<T> in particular).

Partially related: it'd be nice if MemoryExtensions also had an AsSpan API for 2D arrays. At least that can be worked around already by using MemoryMarshal.CreateSpan(ref T, int), but having a built-in API to achieve the same would make it easier to do for developers not familiar with that class.

I can create issues for these various points, but I thought quickly running this by you guys to get some early feedback could be useful to avoid creating many (potentially unnecessary) issues 😄

@ericstj ericstj removed the untriaged New issue has not been triaged by the area owner label Jul 9, 2020
@BreyerW
Copy link

BreyerW commented Jul 10, 2020

Just wanted to make sure this wont work with struct s at least not in-place correct? Im quite interested in reflection free field acessor but it seems with current impl of GetRawData reflection will be still necesssary when mutating struct in-place right?

Or do you expect some trick/api/pattern to work well in this specific case when mutating struct in-place-like?

@MichalStrehovsky
Copy link
Member Author

Or do you expect some trick/api/pattern to work well in this specific case when mutating struct in-place-like?

If you want to modify struct in-place, there's TypedReference.MakeTypedReference that allows you to pass an object and FieldInfos that specify a "path" to a field. Once you have a TypedReference, you would do __refvalue(myRef, X) = value; to set it (X is the type of the field, and value is the value to assign. __refvalue is an undocumented C# keyword to manipulated TypedReference.

@BreyerW
Copy link

BreyerW commented Jul 10, 2020

@MichalStrehovsky according to documentation on Typed it wont work with non-primitive structs eg any struct that contain array or reference type in general at least if i understand docu correctly. We are still screwed in such case right?

Setting that aside i wonder if it would be possible to introduce an overload like public static ref byte GetRawData<T>(ref T object) where T: struct or something similar to make it smoother to work with struct (it could use typedreference under the hood maybe? But then unmanaged constraint would likely be required and that wouldnt cover 100% cases)

@ericstj ericstj modified the milestones: 5.0.0, 6.0.0 Jul 29, 2020
@ericstj
Copy link
Member

ericstj commented Jul 29, 2020

Given the timeline for 5.0 and the state of this issue, I don't see it making in. Moving to 6.0.

@steveharter
Copy link
Member

Closing; assuming #45152 supersedes this.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 24, 2020
@MichalStrehovsky
Copy link
Member Author

Reopening per #45152 (comment).

@MichalStrehovsky MichalStrehovsky removed this from the 6.0.0 milestone Oct 27, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 27, 2023
@dotnet dotnet unlocked this conversation Oct 27, 2023
@steveharter steveharter added this to the 9.0.0 milestone Oct 27, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 27, 2023
@steveharter
Copy link
Member

The suggestion at #45152 (comment) also had a way to return an object. Assuming we want to do that with this PR, I suggest we'll also want to:

  • Add a Type parameter to handle raw (unboxed) value types and possibly strings\arrays.
  • Add to the Unsafe class instead of RuntimeHelpers.
  • Might as well add the generic overload.
namespace System.Runtime.CompilerServices;
public class Unsafe
{
+    public static object? GetObject(Type type, ref byte reference, nuint offset = 0);
+    public static T? GetObject<T>(ref byte reference, nuint offset = 0);
}

@MichalStrehovsky
Copy link
Member Author

The suggestion at #45152 (comment) also had a way to return an object. Assuming we want to do that with this PR, I suggest we'll also want to:

I don't know what's the use case for such API - do we have a use for something like this in CoreLib? We certainly have uses for GetRawData. I don't know a place where we'd do inverse. Unless there's a good use case, I'd not muddy this proposal with that.

@steveharter
Copy link
Member

A reference implementation of a GetObject() exists in TypedReference today.

@MichalStrehovsky
Copy link
Member Author

A reference implementation of a GetObject() exists in TypedReference today.

Looking at this again, #45152 (comment) and #28001 (comment) each propose a very different API.

How does #28001 (comment) differ from Unsafe.As? Is there a difference between these two?

Foo x = Unsafe.GetObject<Foo>(ref myRef, 123);

Foo x = Unsafe.As<byte, Foo>(ref Unsafe.AddByteOffset(ref myRef, 123));

@stephentoub stephentoub modified the milestones: 9.0.0, 10.0.0 Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection
Projects
None yet
Development

No branches or pull requests