Enable user code to determine the alignment needs of types #29235

Korporal · 2019-04-11T14:15:31Z

This is to discuss adding a capability for managed code to determine the memory alignment needs of any unmanaged struct. For certain interop scenarios we may need to allocate struct instances outside the AppDomain (for example in user heap accessible to other code, perhaps native code).

These "manually" allocated instances would be exposed to managed code as ref values.

However when the raw memory is allocated we must ensure that the (artificially generated) ref is aligned in the same way it would be aligned by the CLR so that these returned ref values are wholly compatible to managed code.

A struct containing just byte fields for example can be aligned more flexibly than a struct containing just double fields or decimal fields, in principle we could craft code to analyze the struct since we know the alignment needs of all primitives and the padding rules used by the CLR, but to future proof this more is needed, for example SIMD it seems can have stringent alignment needs..

john-h-k · 2019-04-11T14:36:54Z

Wouldn't the runtime need to expose it? dotnet/coreclr
But definitely seems a nice thing to have for interop
(I fully think this would be a good idea, but an inefficient workaround, it would work to Unsafe.ReadUnaligned<YourStruct> from a returned ref iirc)

Korporal · 2019-04-11T16:03:09Z

@johnkellyoxford - IL provides a very useful opcode that I may have used in the past for something like this, I should look at that code (it's not immediately to hand by the way) but I believe I used this to determine the in-memory (that is CLR memory) size of structs before (Why did I think there was an IL opcode for getting a field's offset...).

By creating a small dynamic IL delegate specific for some type, we can determine the alignment I think.

We'd (in explicit IL) create instance of T (which by definition will be correctly aligned), then get it's address and then - simply - calculate (if that's not too rich a term for this simple operation) how that address is aligned and we're done (cache this integer in a static dictionary keyed by type).

I think this is one way we - ordinary developers - could implement this, but having it added to the Unsafe class would make a lot more sense, there's probably no need to restrict T to an unmanaged struct either, though that is our use case.

The only "flaw" here (but a pretty harmless one) is that we may find some type gets aligned on a 16 byte boundary (using the dynamic IL idea above) purely by fluke, when in fact it's actual alignment needs might only be 4 bytes, but this is fine really.

john-h-k · 2019-04-11T19:09:31Z

that is an issue, because if for example, you get the pointer value 8192, you might think it is 8192 byte aligned, which might cause some issues....

Korporal · 2019-04-11T19:55:58Z

@johnkellyoxford - Hmm, yep nice one, you are correct; I think this needs additional thinking...

john-h-k · 2019-04-11T20:05:40Z

Also, getting a wrong value seems a bit of an error. Easier to call into the runtime to ask for it

Korporal · 2019-04-11T21:42:17Z

@johnkellyoxford - Looking at typical generated IL for this kind of thing reveals little.

For example, this C#:

using System;
public class C {
    public void M() {

        var x = "";
        var d = new Data();
        
        d.a = 8;
    
    }
}

public struct Data
{
    public byte a;
    double b;
}

generates (among other things):

.class public auto ansi beforefieldinit C
    extends [mscorlib]System.Object
{
    // Methods
    .method public hidebysig 
        instance void M () cil managed 
    {
        // Method begins at RVA 0x2050
        // Code size 24 (0x18)
        .maxstack 2
        .locals init (
            [0] string,
            [1] valuetype Data
        )

        IL_0000: nop
        IL_0001: ldstr ""
        IL_0006: stloc.0
        IL_0007: ldloca.s 1
        IL_0009: initobj Data
        IL_000f: ldloca.s 1
        IL_0011: ldc.i4.8
        IL_0012: stfld uint8 Data::a
        IL_0017: ret
    } // end of method C::M

The .local init ( ) stuff isn't clear, it doesn't seem to be IL and I have no idea what it actually results in. The struct address itself (the address of the instance created in the .locals init) is already determined and used by the ldloca.s opcode, so by the time instruction 0007 is encountered the datum's memory has been allocated, an address determined for it and it's alignment already taken into consideration.

So it's not clear how the memory for the item Data (local 1) is actually allocated...

john-h-k · 2019-04-11T21:50:29Z

I think you are misunderstanding. locals init (or more specifically, locals, the init just says they must all be zero intialized) is where your local variables are declared. ldloca.s 1 is just an instruction like any other to load the address of the second local here [1] valuetype Data, which could also be named not indexed, so ldloca.s 'Data'. The address is runtime created, as it is dependent on many things (firstly, hard coded addresses that aren't RVAs won't work great with virtual memory). Given it is a valuetype it will be stackallocated here, defined by the OS + architecture system ABI. So Data will be 8 byte aligned on Windows x86_64, Windows x86 (32 bit), and Linux x86_64, and 4 bytes aligned on Linux x86 (32 bit). Alignment cannot be told from IL* because it is system specific

*Except in cases of explicit Pack and Size attributes, but won't go into that

Korporal · 2019-04-11T22:47:56Z

@johnkellyoxford - Yes, I do have only a partial understanding of this, much of it being inferred. It seems then that the code that actually does the allocation of the struct (in the above example) is not written in IL but external to it, and simply presumed when we look at the IL.

So I guess there are primitives that are specific to each platform, and these primitives are the things that are "aware" of the alignment details...very interesting...

PathogenDavid · 2019-04-12T05:24:35Z

So I guess there are primitives that are specific to each platform, and these primitives are the things that are "aware" of the alignment details...very interesting...

Correct. To put it another way: IL runs on a theoretical virtual machine. The CLR implementation translates the IL representation into one which the target machine (x86, ARM, whatever) understands before it runs. I don't believe the VM ever cares about things like memory alignment, but the target machine might. So it's up to the CLR implementation to care about these things.

(This is why I said in the other issue that the CLR developers might be hesitant to expose something like this. You might in theory have an object allocated with a different alignment depending on whether it was allocated on the stack or the heap. Or maybe special SIMD types trigger special alignment, but only when the runtime thinks SIMD instructions might be used on it. By exposing that implementation detail, they lose flexibility in this regard.)

john-h-k · 2019-04-12T06:51:57Z

@PathogenDavid We have sizeof to be fair. I feel a helper method (rather than IL opcode), like RuntimeHelpers.AlignmentOf<T>() would be best. The fact we have sizeof indicates it can't really change stack<->heap.

john-h-k · 2019-04-12T07:14:06Z

Think this is the field that is key here (the bottom one)

john-h-k · 2019-04-12T07:18:04Z

There are 2 - one for unmanaged, one for managed, I think. Providing explicit methods for both could be done, or just

return MAX(m_LargestAlignmentRequirementOfAllMembers, m_ManagedLargestAlignmentRequirementOfAllMembers);

so you will get the required alignment for passing it between managed/unmanaged code

PathogenDavid · 2019-04-12T09:26:23Z

@PathogenDavid We have sizeof to be fair. I feel a helper method (rather than IL opcode), like RuntimeHelpers.AlignmentOf<T>() would be best.

I'm not arguing against exposing the alignment, just warning @Korporal that it might be higher-friction than he thinks. I'd actually really like to see CoreCLR make more guarantees around alignment.

The fact we have sizeof indicates it can't really change stack<->heap.

Why? Size should not affect alignment at all. I can allocate a 32 byte struct at 0x4000000 or 0x4000001. It'll have the same size, but only one of them will be aligned to a 2 byte boundary.

(Also SizeOf is much more important to interop scenarios than alignment is, so I don't think the early .NET Devs could've gotten away without it even if they wanted to. For instance, Windows is full of structures that need to be initialized with their own size, a quick grep says there's 1,323 such instances of cbSize in the Windows 17763 headers. Reference source says .NET Framework 4.7.2 has 821 uses of Marshal.SizeOf.)

PathogenDavid · 2019-04-12T09:27:13Z

You can link to specific line ranges on GitHub, BTW: https://github.com/dotnet/coreclr/blob/72d49127a0c25e4b931c81e621c2411bfb6633a5/src/vm/class.h#L383-L391

john-h-k · 2019-04-12T10:05:02Z

Hmm, is it best to expose it as
GetManagedAlignment<T>()
and
GetNativeAlignment<T>(),
each exposing the respective EEClassLayoutInfo field, or to do
GetAlignment<T>(), which does one of the following, and if so, which one?:

returns the managed alignment
returns the unmanaged alignment
returns the greater of the two

WIP - Adds a method which exposes the internal EEClassLayoutInfo alignment members. Namely, returns the max of unmanaged alignment and managed alignment, to allow passing the type between the two types of code Related dotnet/corefx#36792

Korporal · 2019-04-12T20:59:43Z

@johnkellyoxford @PathogenDavid

Very interesting guys. Ultimately the goal is to ensure that we can create and return a ref to an unmanaged struct in such a way that no member of the struct is aligned improperly for the platform.

So I don't see any conceptual difference between a "managed alignment" and a "native alignment" in this regard. Ultimately every datum refers to an actual memory address which is by definition a physical thing not a managed thing.

I'd be curious to see a struct example that has a different managed and native alignment. For the time being this is almost academic because we can easily define the default allocation alignment to be 8 and have rather a small wastage likelihood. But in the future we may see types that need 16 byte or 32 byte alignment and aligning these on 8 byte boundary might lead to nasty stuff.

Anyway what are these "SIMD' types I hear about? types that do (might?) need alignment > 8?

PathogenDavid · 2019-04-13T06:11:31Z

I'd be curious to see a struct example that has a different managed and native alignment.

My assumption is that it is for structs which are marshaled, so it wouldn't be relevant in your case.

I'm actually not entirely certain if either of these two fields are what you actually want. They're only used for debug output and field marshaling unless I missed something. They don't appear to directly influence allocation.

Anyway what are these "SIMD' types I hear about? types that do (might?) need alignment > 8?

SIMD is short for "Single Instruction, Multiple Data".

Multiple data: At a hardware level you have special large register that are divided up into multiple discrete values.

Single instruction: There are special instructions (big list for x86 here) that operate on multiple values at once in regards to these special registers. For example, the addps xmm0, xmm1 instruction adds floats together all at once.

In the context of x86, SIMD types are generally the only time when you need to start caring about alignment because some load/store operations working with those registers have alignment requirements.

In .NET, SIMD types are exposed under the System.Numerics namespace and System.Runtime.Intrincics. Here is an example using System.Numerics to demonstrate the code gen for SIMD and non-SIMD vector addition. (You can actually see that the JIT emits unaligned load instructions (vmovupd).)

john-h-k · 2019-04-13T06:14:54Z

Honestly, if I were you, I'd just respect 8 or 16 byte alignment by default and just work with that. It's easiest

PathogenDavid · 2019-04-13T06:35:22Z

@Korporal I am rapidly starting to think that instead of asking for a new API, you need to ask a JIT expert if you even need the API in the first place. I'm pretty confident that you don't.

If you're writing a custom allocator on the native side for some reason, make sure everything is word-aligned. If the native side has stricter alignment requirements for SIMD, make sure its meeting those requirements. Otherwise I think you're worrying about this way too much.

PathogenDavid · 2019-04-13T06:46:21Z

Honestly, if I were you, I'd just respect 8 or 16 byte alignment by default and just work with that. It's easiest

CRT's malloc aligns to 8 bytes on 32-bit and 16 on 64-bit.

glibc's malloc is a little more vague with "suitably aligned for any built-in type". (Comment in the implementation says it's double the word size, which is the same as the CRT.)

weltkante · 2023-10-25T12:02:18Z

Seems you can now do this calculation yourself and it gets optimized pretty well by the JIT, as mentioned here:

We added JIT support for some of these patterns in .NET 8 in #81998. See e.g. this example: https://godbolt.org/z/or76frsWs

I updated my interop helper for alignment calculation and made a few more tests, structs look very good, a single constant load, but alignment of primitives is not perfectly optimized yet, but still pretty good: https://godbolt.org/z/vTMxasnf7

public static class InteropHelper {
    private struct AlignmentCheck<T> where T : unmanaged {
        public byte Padding;
        public T Content;
    }

    [SkipLocalsInit, MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int AlignmentOf<T>() where T : unmanaged {
        Unsafe.SkipInit(out AlignmentCheck<T> container);
        return (int)Unsafe.ByteOffset(ref container.Padding, ref Unsafe.As<T, byte>(ref container.Content));
    }
}

full example and optimization results

public static class C {
    private struct AlignmentCheck<T> where T : unmanaged {
        public byte Padding;
        public T Content;
    }

    [SkipLocalsInit, MethodImpl(MethodImplOptions.AggressiveInlining)]
    private static int AlignmentOf<T>() where T : unmanaged {
        Unsafe.SkipInit(out AlignmentCheck<T> container);
        return (int)Unsafe.ByteOffset(ref container.Padding, ref Unsafe.As<T, byte>(ref container.Content));
    }

    public static int GetShortByteKVPAlignment() => AlignmentOf<KeyValuePair<short, byte>>();
    public static int GetDoubleShortKVPAlignment() => AlignmentOf<KeyValuePair<double, short>>();
    public static int GetShortByteVTAlignment() => AlignmentOf<(short, byte)>();
    public static int GetDoubleShortVTAlignment() => AlignmentOf<(double, short)>();
    public static int GetDecimalAlignment() => AlignmentOf<decimal>();
    public static int GetDoubleAlignment() => AlignmentOf<double>();
    public static int GetShortAlignment() => AlignmentOf<short>();
}

optimizes to

C:GetShortByteKVPAlignment():int (FullOpts):
       mov      eax, 2
       ret      

C:GetDoubleShortKVPAlignment():int (FullOpts):
       mov      eax, 8
       ret      

C:GetShortByteVTAlignment():int (FullOpts):
       mov      eax, 4
       ret      

C:GetDoubleShortVTAlignment():int (FullOpts):
       mov      eax, 8
       ret      

C:GetDecimalAlignment():int (FullOpts):
       mov      eax, 8
       ret      

C:GetDoubleAlignment():int (FullOpts):
       sub      rsp, 24
       lea      rax, bword ptr [rsp+0x10]
       lea      rcx, bword ptr [rsp+0x08]
       sub      rax, rcx
       add      rsp, 24
       ret      

C:GetShortAlignment():int (FullOpts):
       push     rax
       lea      rax, bword ptr [rsp+0x02]
       lea      rcx, bword ptr [rsp]
       sub      rax, rcx
       add      rsp, 8
       ret

hamarb123 · 2023-10-25T14:27:46Z

I've used something like this previously:

public static class Helpers
{
    private struct AlignHelper<T>
    {
       T value;
       byte b;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int AlignmentOf<T>() => (int)(sizeof(AlignHelper<T>) - sizeof(T));
}

I don't see why this wouldn't work, and it probably provides better codegen, since there's nothing to pretend to initialise.

tannergooding · 2023-10-25T15:50:52Z

I've used something like this previously:

Note that this computes the packing of T, which is not necessarily the same as the alignment of T.

There are types, such as Int128, which currently have 16 byte packing, but retain 4 or 8 byte alignment; due to how the GC currently works.

msftgits transferred this issue from dotnet/corefx Feb 1, 2020

msftgits added this to the Future milestone Feb 1, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

ericstj removed the untriaged New issue has not been triaged by the area owner label Jun 25, 2020

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr removed this from Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr removed this from Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr moved this to Future in Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable user code to determine the alignment needs of types #29235

Enable user code to determine the alignment needs of types #29235

Korporal commented Apr 11, 2019

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019 •

edited

Loading

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019 •

edited

Loading

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019 •

edited

Loading

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019

PathogenDavid commented Apr 12, 2019

john-h-k commented Apr 12, 2019

john-h-k commented Apr 12, 2019

john-h-k commented Apr 12, 2019 •

edited

Loading

PathogenDavid commented Apr 12, 2019

PathogenDavid commented Apr 12, 2019

john-h-k commented Apr 12, 2019 •

edited

Loading

Korporal commented Apr 12, 2019

PathogenDavid commented Apr 13, 2019 •

edited

Loading

john-h-k commented Apr 13, 2019

PathogenDavid commented Apr 13, 2019

PathogenDavid commented Apr 13, 2019

weltkante commented Oct 25, 2023

hamarb123 commented Oct 25, 2023

tannergooding commented Oct 25, 2023 •

edited

Loading

Enable user code to determine the alignment needs of types #29235

Enable user code to determine the alignment needs of types #29235

Comments

Korporal commented Apr 11, 2019

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019 • edited Loading

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019 • edited Loading

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019 • edited Loading

john-h-k commented Apr 11, 2019

Korporal commented Apr 11, 2019

PathogenDavid commented Apr 12, 2019

john-h-k commented Apr 12, 2019

john-h-k commented Apr 12, 2019

john-h-k commented Apr 12, 2019 • edited Loading

PathogenDavid commented Apr 12, 2019

PathogenDavid commented Apr 12, 2019

john-h-k commented Apr 12, 2019 • edited Loading

Korporal commented Apr 12, 2019

PathogenDavid commented Apr 13, 2019 • edited Loading

john-h-k commented Apr 13, 2019

PathogenDavid commented Apr 13, 2019

PathogenDavid commented Apr 13, 2019

weltkante commented Oct 25, 2023

hamarb123 commented Oct 25, 2023

tannergooding commented Oct 25, 2023 • edited Loading

Korporal commented Apr 11, 2019 •

edited

Loading

Korporal commented Apr 11, 2019 •

edited

Loading

Korporal commented Apr 11, 2019 •

edited

Loading

john-h-k commented Apr 12, 2019 •

edited

Loading

john-h-k commented Apr 12, 2019 •

edited

Loading

PathogenDavid commented Apr 13, 2019 •

edited

Loading

tannergooding commented Oct 25, 2023 •

edited

Loading