Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable user code to determine the alignment needs of types #29235

Open
Korporal opened this issue Apr 11, 2019 · 23 comments
Open

Enable user code to determine the alignment needs of types #29235

Korporal opened this issue Apr 11, 2019 · 23 comments
Labels
area-Meta enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@Korporal
Copy link

This is to discuss adding a capability for managed code to determine the memory alignment needs of any unmanaged struct. For certain interop scenarios we may need to allocate struct instances outside the AppDomain (for example in user heap accessible to other code, perhaps native code).

These "manually" allocated instances would be exposed to managed code as ref values.

However when the raw memory is allocated we must ensure that the (artificially generated) ref is aligned in the same way it would be aligned by the CLR so that these returned ref values are wholly compatible to managed code.

A struct containing just byte fields for example can be aligned more flexibly than a struct containing just double fields or decimal fields, in principle we could craft code to analyze the struct since we know the alignment needs of all primitives and the padding rules used by the CLR, but to future proof this more is needed, for example SIMD it seems can have stringent alignment needs..

@john-h-k
Copy link
Contributor

Wouldn't the runtime need to expose it? dotnet/coreclr
But definitely seems a nice thing to have for interop
(I fully think this would be a good idea, but an inefficient workaround, it would work to Unsafe.ReadUnaligned<YourStruct> from a returned ref iirc)

@Korporal
Copy link
Author

Korporal commented Apr 11, 2019

@johnkellyoxford - IL provides a very useful opcode that I may have used in the past for something like this, I should look at that code (it's not immediately to hand by the way) but I believe I used this to determine the in-memory (that is CLR memory) size of structs before (Why did I think there was an IL opcode for getting a field's offset...).

By creating a small dynamic IL delegate specific for some type, we can determine the alignment I think.

We'd (in explicit IL) create instance of T (which by definition will be correctly aligned), then get it's address and then - simply - calculate (if that's not too rich a term for this simple operation) how that address is aligned and we're done (cache this integer in a static dictionary keyed by type).

I think this is one way we - ordinary developers - could implement this, but having it added to the Unsafe class would make a lot more sense, there's probably no need to restrict T to an unmanaged struct either, though that is our use case.

The only "flaw" here (but a pretty harmless one) is that we may find some type gets aligned on a 16 byte boundary (using the dynamic IL idea above) purely by fluke, when in fact it's actual alignment needs might only be 4 bytes, but this is fine really.

@john-h-k
Copy link
Contributor

that is an issue, because if for example, you get the pointer value 8192, you might think it is 8192 byte aligned, which might cause some issues....

@Korporal
Copy link
Author

Korporal commented Apr 11, 2019

@johnkellyoxford - Hmm, yep nice one, you are correct; I think this needs additional thinking...

@john-h-k
Copy link
Contributor

Also, getting a wrong value seems a bit of an error. Easier to call into the runtime to ask for it

@Korporal
Copy link
Author

Korporal commented Apr 11, 2019

@johnkellyoxford - Looking at typical generated IL for this kind of thing reveals little.

For example, this C#:

using System;
public class C {
    public void M() {

        var x = "";
        var d = new Data();
        
        d.a = 8;
    
    }
}

public struct Data
{
    public byte a;
    double b;
}

generates (among other things):

.class public auto ansi beforefieldinit C
    extends [mscorlib]System.Object
{
    // Methods
    .method public hidebysig 
        instance void M () cil managed 
    {
        // Method begins at RVA 0x2050
        // Code size 24 (0x18)
        .maxstack 2
        .locals init (
            [0] string,
            [1] valuetype Data
        )

        IL_0000: nop
        IL_0001: ldstr ""
        IL_0006: stloc.0
        IL_0007: ldloca.s 1
        IL_0009: initobj Data
        IL_000f: ldloca.s 1
        IL_0011: ldc.i4.8
        IL_0012: stfld uint8 Data::a
        IL_0017: ret
    } // end of method C::M

The .local init ( ) stuff isn't clear, it doesn't seem to be IL and I have no idea what it actually results in. The struct address itself (the address of the instance created in the .locals init) is already determined and used by the ldloca.s opcode, so by the time instruction 0007 is encountered the datum's memory has been allocated, an address determined for it and it's alignment already taken into consideration.

So it's not clear how the memory for the item Data (local 1) is actually allocated...

@john-h-k
Copy link
Contributor

I think you are misunderstanding. locals init (or more specifically, locals, the init just says they must all be zero intialized) is where your local variables are declared. ldloca.s 1 is just an instruction like any other to load the address of the second local here [1] valuetype Data, which could also be named not indexed, so ldloca.s 'Data'. The address is runtime created, as it is dependent on many things (firstly, hard coded addresses that aren't RVAs won't work great with virtual memory). Given it is a valuetype it will be stackallocated here, defined by the OS + architecture system ABI. So Data will be 8 byte aligned on Windows x86_64, Windows x86 (32 bit), and Linux x86_64, and 4 bytes aligned on Linux x86 (32 bit). Alignment cannot be told from IL* because it is system specific

*Except in cases of explicit Pack and Size attributes, but won't go into that

@Korporal
Copy link
Author

@johnkellyoxford - Yes, I do have only a partial understanding of this, much of it being inferred. It seems then that the code that actually does the allocation of the struct (in the above example) is not written in IL but external to it, and simply presumed when we look at the IL.

So I guess there are primitives that are specific to each platform, and these primitives are the things that are "aware" of the alignment details...very interesting...

@PathogenDavid
Copy link
Contributor

So I guess there are primitives that are specific to each platform, and these primitives are the things that are "aware" of the alignment details...very interesting...

Correct. To put it another way: IL runs on a theoretical virtual machine. The CLR implementation translates the IL representation into one which the target machine (x86, ARM, whatever) understands before it runs. I don't believe the VM ever cares about things like memory alignment, but the target machine might. So it's up to the CLR implementation to care about these things.

(This is why I said in the other issue that the CLR developers might be hesitant to expose something like this. You might in theory have an object allocated with a different alignment depending on whether it was allocated on the stack or the heap. Or maybe special SIMD types trigger special alignment, but only when the runtime thinks SIMD instructions might be used on it. By exposing that implementation detail, they lose flexibility in this regard.)

@john-h-k
Copy link
Contributor

@PathogenDavid We have sizeof to be fair. I feel a helper method (rather than IL opcode), like RuntimeHelpers.AlignmentOf<T>() would be best. The fact we have sizeof indicates it can't really change stack<->heap.

@john-h-k
Copy link
Contributor

image
Think this is the field that is key here (the bottom one)

@john-h-k
Copy link
Contributor

john-h-k commented Apr 12, 2019

image
There are 2 - one for unmanaged, one for managed, I think. Providing explicit methods for both could be done, or just

return MAX(m_LargestAlignmentRequirementOfAllMembers, m_ManagedLargestAlignmentRequirementOfAllMembers);

so you will get the required alignment for passing it between managed/unmanaged code

@PathogenDavid
Copy link
Contributor

@PathogenDavid We have sizeof to be fair. I feel a helper method (rather than IL opcode), like RuntimeHelpers.AlignmentOf<T>() would be best.

I'm not arguing against exposing the alignment, just warning @Korporal that it might be higher-friction than he thinks. I'd actually really like to see CoreCLR make more guarantees around alignment.

The fact we have sizeof indicates it can't really change stack<->heap.

Why? Size should not affect alignment at all. I can allocate a 32 byte struct at 0x4000000 or 0x4000001. It'll have the same size, but only one of them will be aligned to a 2 byte boundary.


(Also SizeOf is much more important to interop scenarios than alignment is, so I don't think the early .NET Devs could've gotten away without it even if they wanted to. For instance, Windows is full of structures that need to be initialized with their own size, a quick grep says there's 1,323 such instances of cbSize in the Windows 17763 headers. Reference source says .NET Framework 4.7.2 has 821 uses of Marshal.SizeOf.)

@PathogenDavid
Copy link
Contributor

You can link to specific line ranges on GitHub, BTW: https://github.com/dotnet/coreclr/blob/72d49127a0c25e4b931c81e621c2411bfb6633a5/src/vm/class.h#L383-L391

@john-h-k
Copy link
Contributor

john-h-k commented Apr 12, 2019

Hmm, is it best to expose it as
GetManagedAlignment<T>()
and
GetNativeAlignment<T>(),
each exposing the respective EEClassLayoutInfo field, or to do
GetAlignment<T>(), which does one of the following, and if so, which one?:

  • returns the managed alignment
  • returns the unmanaged alignment
  • returns the greater of the two

john-h-k referenced this issue in john-h-k/coreclr Apr 12, 2019
WIP - Adds a method which exposes the internal EEClassLayoutInfo alignment members. Namely, returns the max of unmanaged alignment and managed alignment, to allow passing the type between the two types of code

Related dotnet/corefx#36792
@Korporal
Copy link
Author

@johnkellyoxford @PathogenDavid

Very interesting guys. Ultimately the goal is to ensure that we can create and return a ref to an unmanaged struct in such a way that no member of the struct is aligned improperly for the platform.

So I don't see any conceptual difference between a "managed alignment" and a "native alignment" in this regard. Ultimately every datum refers to an actual memory address which is by definition a physical thing not a managed thing.

I'd be curious to see a struct example that has a different managed and native alignment. For the time being this is almost academic because we can easily define the default allocation alignment to be 8 and have rather a small wastage likelihood. But in the future we may see types that need 16 byte or 32 byte alignment and aligning these on 8 byte boundary might lead to nasty stuff.

Anyway what are these "SIMD' types I hear about? types that do (might?) need alignment > 8?

@PathogenDavid
Copy link
Contributor

PathogenDavid commented Apr 13, 2019

I'd be curious to see a struct example that has a different managed and native alignment.

My assumption is that it is for structs which are marshaled, so it wouldn't be relevant in your case.

I'm actually not entirely certain if either of these two fields are what you actually want. They're only used for debug output and field marshaling unless I missed something. They don't appear to directly influence allocation.

Anyway what are these "SIMD' types I hear about? types that do (might?) need alignment > 8?

SIMD is short for "Single Instruction, Multiple Data".

Multiple data: At a hardware level you have special large register that are divided up into multiple discrete values.

Single instruction: There are special instructions (big list for x86 here) that operate on multiple values at once in regards to these special registers. For example, the addps xmm0, xmm1 instruction adds floats together all at once.

In the context of x86, SIMD types are generally the only time when you need to start caring about alignment because some load/store operations working with those registers have alignment requirements.

In .NET, SIMD types are exposed under the System.Numerics namespace and System.Runtime.Intrincics. Here is an example using System.Numerics to demonstrate the code gen for SIMD and non-SIMD vector addition. (You can actually see that the JIT emits unaligned load instructions (vmovupd).)

@john-h-k
Copy link
Contributor

Honestly, if I were you, I'd just respect 8 or 16 byte alignment by default and just work with that. It's easiest

@PathogenDavid
Copy link
Contributor

@Korporal I am rapidly starting to think that instead of asking for a new API, you need to ask a JIT expert if you even need the API in the first place. I'm pretty confident that you don't.

If you're writing a custom allocator on the native side for some reason, make sure everything is word-aligned. If the native side has stricter alignment requirements for SIMD, make sure its meeting those requirements. Otherwise I think you're worrying about this way too much.

@PathogenDavid
Copy link
Contributor

Honestly, if I were you, I'd just respect 8 or 16 byte alignment by default and just work with that. It's easiest

CRT's malloc aligns to 8 bytes on 32-bit and 16 on 64-bit.

glibc's malloc is a little more vague with "suitably aligned for any built-in type". (Comment in the implementation says it's double the word size, which is the same as the CRT.)

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@ericstj ericstj removed the untriaged New issue has not been triaged by the area owner label Jun 25, 2020
@weltkante
Copy link
Contributor

Seems you can now do this calculation yourself and it gets optimized pretty well by the JIT, as mentioned here:

We added JIT support for some of these patterns in .NET 8 in #81998. See e.g. this example: https://godbolt.org/z/or76frsWs

I updated my interop helper for alignment calculation and made a few more tests, structs look very good, a single constant load, but alignment of primitives is not perfectly optimized yet, but still pretty good: https://godbolt.org/z/vTMxasnf7

public static class InteropHelper {
    private struct AlignmentCheck<T> where T : unmanaged {
        public byte Padding;
        public T Content;
    }

    [SkipLocalsInit, MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int AlignmentOf<T>() where T : unmanaged {
        Unsafe.SkipInit(out AlignmentCheck<T> container);
        return (int)Unsafe.ByteOffset(ref container.Padding, ref Unsafe.As<T, byte>(ref container.Content));
    }
}
full example and optimization results
public static class C {
    private struct AlignmentCheck<T> where T : unmanaged {
        public byte Padding;
        public T Content;
    }

    [SkipLocalsInit, MethodImpl(MethodImplOptions.AggressiveInlining)]
    private static int AlignmentOf<T>() where T : unmanaged {
        Unsafe.SkipInit(out AlignmentCheck<T> container);
        return (int)Unsafe.ByteOffset(ref container.Padding, ref Unsafe.As<T, byte>(ref container.Content));
    }

    public static int GetShortByteKVPAlignment() => AlignmentOf<KeyValuePair<short, byte>>();
    public static int GetDoubleShortKVPAlignment() => AlignmentOf<KeyValuePair<double, short>>();
    public static int GetShortByteVTAlignment() => AlignmentOf<(short, byte)>();
    public static int GetDoubleShortVTAlignment() => AlignmentOf<(double, short)>();
    public static int GetDecimalAlignment() => AlignmentOf<decimal>();
    public static int GetDoubleAlignment() => AlignmentOf<double>();
    public static int GetShortAlignment() => AlignmentOf<short>();
}

optimizes to

C:GetShortByteKVPAlignment():int (FullOpts):
       mov      eax, 2
       ret      

C:GetDoubleShortKVPAlignment():int (FullOpts):
       mov      eax, 8
       ret      

C:GetShortByteVTAlignment():int (FullOpts):
       mov      eax, 4
       ret      

C:GetDoubleShortVTAlignment():int (FullOpts):
       mov      eax, 8
       ret      

C:GetDecimalAlignment():int (FullOpts):
       mov      eax, 8
       ret      

C:GetDoubleAlignment():int (FullOpts):
       sub      rsp, 24
       lea      rax, bword ptr [rsp+0x10]
       lea      rcx, bword ptr [rsp+0x08]
       sub      rax, rcx
       add      rsp, 24
       ret      

C:GetShortAlignment():int (FullOpts):
       push     rax
       lea      rax, bword ptr [rsp+0x02]
       lea      rcx, bword ptr [rsp]
       sub      rax, rcx
       add      rsp, 8
       ret     

@hamarb123
Copy link
Contributor

I've used something like this previously:

public static class Helpers
{
    private struct AlignHelper<T>
    {
       T value;
       byte b;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int AlignmentOf<T>() => (int)(sizeof(AlignHelper<T>) - sizeof(T));
}

I don't see why this wouldn't work, and it probably provides better codegen, since there's nothing to pretend to initialise.

@tannergooding
Copy link
Member

tannergooding commented Oct 25, 2023

I've used something like this previously:

Note that this computes the packing of T, which is not necessarily the same as the alignment of T.

There are types, such as Int128, which currently have 16 byte packing, but retain 4 or 8 byte alignment; due to how the GC currently works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Meta enhancement Product code improvement that does NOT require public API changes/additions
Projects
No open projects
Development

No branches or pull requests

9 participants