Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreCLR on ARMv7 #4345

Closed
benpye opened this issue Jul 1, 2015 · 46 comments
Closed

CoreCLR on ARMv7 #4345

benpye opened this issue Jul 1, 2015 · 46 comments

Comments

@benpye
Copy link
Contributor

benpye commented Jul 1, 2015

This is ultimately probably part of #4296 . Currently I am working on bringing up PAL on ARMv7. This is currently going reasonably well, however I feel it would be wise to open an issue regardless. In addition, Clang and GCC exhibit some different behavior regarding va_lists when on ARM instead of x86/AMD64.

The following shows the issue, it's basically the pal test c_runtime/vprintf/test1 which is where I realised the issue. Under x86/AMD64, this code will compile fine. When targeting ARM however this will fail to compile complaining that a void * is incompatible with va_list. What is the correct action to take here? Currently in my branch I have disabled the test, but this is probably not ideal.

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
    char checkstr[] = "hello world";
    int ret;

    ret = vprintf("hello world", (void*)0);

    if (ret != strlen(checkstr))
    {
        printf("Expected vprintf to return %d, got %d.\n", 
            strlen(checkstr), ret);
        return 1;

    }

    return 0;
}
@benpye benpye changed the title CoreCLR on ARM32 CoreCLR on ARMv7 Jul 1, 2015
@kangaroo
Copy link
Contributor

kangaroo commented Jul 1, 2015

Can't you just change the cast to a (va_list) instead of a (void*)?

@benpye
Copy link
Contributor Author

benpye commented Jul 1, 2015

I don't believe that works, you mean as in the following?

#include <stdio.h>
#include <string.h>
#include <stdarg.h>

int main(int argc, char *argv[])
{
    char checkstr[] = "hello world";
    int ret;

    ret = vprintf("hello world", (va_list)((void*)0));

    if (ret != strlen(checkstr))
    {
        printf("Expected vprintf to return %d, got %d.\n", 
            strlen(checkstr), ret);
        return 1;

    }

    return 0;
}

Clang will give

test3.c:10:34: error: used type 'va_list' (aka '__builtin_va_list') where
      arithmetic or pointer type is required

and an online gcc appears to give

10 : error: conversion to non-scalar type requested

@kangaroo
Copy link
Contributor

kangaroo commented Jul 2, 2015

Wait a sec, I checked the test and it currently is:

 ret = vprintf("hello world", NULL);

Are you not up to date with master?

@shahid-pk
Copy link
Contributor

@kangaroo yup he is not up to date here is his branch https://github.com/benpye/coreclr/tree/linux-arm which is four commits behind.

@kangaroo
Copy link
Contributor

kangaroo commented Jul 2, 2015

This test has never changed on master. This appears to be some local changes?

@benpye
Copy link
Contributor Author

benpye commented Jul 2, 2015

It was so that it built without PAL to ensure it wasn't a PAL related issue. I do believe PAL defines NULL as (void*)0 or something to that effect so I was trying to ensure it wasn't a difference between this "NULL" and PALs NULL. Either experiences the same compile error.

@benpye
Copy link
Contributor Author

benpye commented Jul 2, 2015

Further, this inconsistency with va_list actually appears to cause some of the other test failures too. I believe that at least the SetCurrentDirectoryA test failures are due to this. If we assume that on x86 that va_list is a pointer (as indicated by the ability to cast from a void* ), then when we call NativeVsnprintf (src/pal/src/cruntime/printfcpp.cpp) which then calls vsnprintf, we are passing a pointer to the real object behind va_list. This behaviour is actually assumed by PAL currently. Once this function has returned it is assumed that va_list will of had the relevant arguments taken off of it, leaving the next argument to be the next one to be handled. On x86_64 this works fine, due to the implementation. On ARM this is not the case, it seems reasonable to assume on ARM that va_list is backed by some type, but not a pointer (as we cannot cast a void*). This means that the call to NativeVsnprintf is passing a copy of the va_list, and so when the call returns, the va_list we have is still in the same state, and so we take the same argument again.

This is shown in the test file_io/SetCurrentDirectoryA/test1, here there is a line sprintf(szBuiltDir,"%s%s/", szHomeDir, szDirName) . Stepping through with gdb indicates that szHomeDir="/home/ben/git/coreclr/bin/obj/Linux.arm.Debug/src/pal/tests/palsuite/file_io/SetCurrentDirectoryA/test1/" and szDirName="testing", however after the call szBuildDir="/home/ben/git/coreclr/bin/obj/Linux.arm.Debug/src/pal/tests/palsuite/file_io/SetCurrentDirectoryA/test1//home/ben/git/coreclr/bin/obj/Linux.arm.Debug/src/pal/tests/palsuite/file_io/SetCurrentDirectoryA/test1//", which it can be seen would be the result of sprintf(szBuiltDir,"%s%s/", szHomeDir, szHomeDir).

It seems if PAL is to be ported to architectures other than x86_64, it may be nessecary to provide our own vsnprintf and vfprintf instead of using the system provided versions. Whilst perhaps undesirable, I don't see any other way to ensure that va_list behaves as expected.

The current implementation actually violates the C language specification, section 7.16.3 states that

The object ap may be passed as an argument to another function; if that function invokes the va_arg macro with parameter ap, the value of ap in the calling function is indeterminate and shall be passed to the va_end macro prior to any further reference to ap.

@OtherCrashOverride
Copy link

You should probably indicate what platform and environment you are using for developement. The C compiler used on non-windows is LLVM. It is the same on x86, x64 and ARM if the version matches.

@OtherCrashOverride
Copy link

There are many ARM dev boards out there that run the same version of Ubuntu that x64 linux does. A better approach to development would probably be to use a QEMU ARMv7 emulation on X86/64 so that those without ARM hardware can participate.

@benpye
Copy link
Contributor Author

benpye commented Jul 2, 2015

I've tried to keep the environment as close to that as the standard Linux build environment. I'm actually using a Raspberry Pi 2, with Ubuntu 14.04, and LLVM/Clang 3.5. I'm sure Qemu would be fine too, though currently I'm detecting only armv7l, which is probably a subset of what we can target.

@OtherCrashOverride
Copy link

As I understand it, this is a Linux ARMv7 build. The C/C++ code should use the same compiler on the Linux x64 which is LLVM. If that environment does not have an issue with va_list, then the ARM version should not either since its the same LLVM compiler. This may indicate that cmake configuration is not correct.

@benpye
Copy link
Contributor Author

benpye commented Jul 2, 2015

The behaviour expected of va_list is undefined in the spec, it even states (as I've quoted above), to explicitly not do what PAL is doing. https://github.com/dotnet/coreclr/blob/master/src/pal/src/cruntime/printfcpp.cpp#L1779 shows one case where this is being done. va_list is dependent on the ABI, as I was reading into this some people indicated that this code would not work on x86 instead of x86_64 either, although I haven't tried. The fact it doesn't work on ARM is unsurprising given the considerable difference in the ABI.

@OtherCrashOverride
Copy link

After further research, it seems that on ARM the ABI defines the va_list. However, not everyone (Apple) uses it. So its probably worth investigating abstracting it in CoreCLR to meet the needs of various targets.

@benpye
Copy link
Contributor Author

benpye commented Jul 2, 2015

That doesn't concern us if we stick to the C specification, we shouldn't have to care about the va_list implementation (I do not believe). It doesn't need abstracting I don't think.

@OtherCrashOverride
Copy link

We have to be able to marshal it. So that implies something in the runtime has to have knowledge of how to do that for the target platform.

http://bartdesmet.net/blogs/bart/archive/2006/09/28/4473.aspx
"Calling printf from C# - The tale of the hidden __arglist keyword"

@benpye
Copy link
Contributor Author

benpye commented Jul 3, 2015

benpye/coreclr@94ef624

I have implemented CONTEXT_CaptureContext in assembly for ARM as on AMD64, and this appears to function correctly. Additionally, the unwinding by libunwind appears to work also. I am having trouble however tracking down an issue, when an exception is thrown by RaiseException, it is never caught, instead the following is printed in the terminal

terminate called after throwing an instance of 'PAL_SEHException'
terminate called recursively
Aborted

Any ideas?

EDIT: After some further investigation this turns out to be a libunwind issue. Linking anything with libunwind on ARM seems to break C++ exceptions. This is actually mentioned in a bug report from 2009 on the RedHat tracker https://bugzilla.redhat.com/show_bug.cgi?id=480412 . It's interesting that AMD64 doesn't encounter it, perhaps libunwind is used in Clang there. It's solved by linking gcc_s before libunwind on ARM anyway, so the only PAL test failures now seem to be the result of the faulty vsnprintf implementation and RtlRestoreContext should still be implemented in assembly for the same reason as on AMD64.

@kangaroo
Copy link
Contributor

kangaroo commented Jul 4, 2015

I'm going to guess that the unwinding issue is because libunwind exports its own _Unwind_RaiseException, and various other Unwind* functions. They're probably not binary compatible with gcc, which is causing landing pads to be skipped.

This also makes sense that linking gcc_s first 'fixes' it. If you'd like to confirm, try building libunwind without UnwindLevel1*.c, and link it first. I bet that works too.

EDIT: Sorry, its Unwind-EHABI.cpp. Also the above refers to llvm libunwind, if you're using non gnu, adjust accordingly.

@benpye
Copy link
Contributor Author

benpye commented Jul 4, 2015

I do agree, and gcc_s provides the default unwinding functions. It's apparently because libunwind lacks a function that GCC (and presumably Clang) rely on for exception support, at least, that's what the old bug report seemed to indicate. I can try building my own libunwind but I would at least think linking gcc_s first so that it's exports are used would be better since then we are using the system libunwind, I'll gladly be proven wrong however. Again, I don't see why this issue doesn't show up on the FreeBSD and Linux builds for AMD64, they are all using Clang and libunwind etc...

@kangaroo
Copy link
Contributor

kangaroo commented Jul 4, 2015

Try libunwind from llvm, it seems to have EHABI support:

http://llvm.org/git/libunwind

EDIT: I'm not sure if its binary compatible with whatever gcc_s is doing.

@benpye
Copy link
Contributor Author

benpye commented Jul 4, 2015

It appears that if I build libunwind from llvm, it also works. This does require changing the libunwind build script however, as it cannot find the /usr/share/llvm-3.5/cmake directory, neither do I see any way to include the required libcxxabi headers without modifying the CMakeList file. Lastly, llvm's libunwind does not support unw_get_save_loc so we'd have to take the current OS X path at https://github.com/dotnet/coreclr/blob/master/src/pal/src/exception/seh-unwind.cpp#L99 . A quick search through CoreCLR on GitHub doesn't appear to show anywhere that the context pointers are used, maybe you found somewhere during the OS X port since you have set it to NULL? I'm not sure what we really gain from using LLVM's libunwind here? As I see it we would have to take some effort to integrate it into the build script and it's possible we'd have to maintain a fork of it to modify the build system for libunwind also. It could be used across Linux, FreeBSD and OS X on all architectures, so I suppose there is that to gain. Linking gcc_s is not however pulling in a new library as it's a default library, if that was your concern.

@kangaroo
Copy link
Contributor

kangaroo commented Jul 4, 2015

I was more curious wether LLVM libunwind was binary compatible. We'd like to move to LLVM libunwind in the long term, but have held off due to the lack of context pointer support. That said, if non gnu is incompatible with ARM EHABI, we may want to consider the strategy there. I'll try to carve out some time to take a look at exactly whats going on here.

Lack of context pointers isn't a huge concern. We'll take a minor perf hit due to increased stack pressure, but in the grand scheme of things its not something to worry about.

@benpye
Copy link
Contributor Author

benpye commented Jul 5, 2015

Added RtlContextRestore in assembly, and added floating point register save restore to RtlContextCapture/RtlContextRestore. Hit a slight wall though, whilst our context object has the FP registers, I cannot find where they are defined in ucontext_t, the normal registers are in mcontext_t, but the FP registers appear missing, needed for the equivalent FPREG_ macros used to convert between our context and the libunwind/linux context.

I have assumed VFPv3 here and will be changing the build script to select ARMv7, VFPv3, and importantly no NEON. This appears to be what is required by the jit looking at the register list at least, and is also the baseline for Ubuntu armhf, which if keeping with the AMD64 version, should be our supported distro, so it seems a sensible feature set to target. This will also allow scaleway.io to be used for those who do not have local ARM boards, though I've yet to setup anything there.

EDIT: Bionic has an interesting file, https://android.googlesource.com/platform/bionic/+/c124baa/libc/arch-arm/include/machine/ucontext.h . It indicates only d8-d15 are saved and that there is "no reliable way to extract the FP state from a context_t on ARM" as the registers are not exposed in any clear way.

@kangaroo
Copy link
Contributor

kangaroo commented Jul 5, 2015

On linux kernels, co-processor registers are stored in uc_regspace.

/*
 * struct sigcontext only has room for the basic registers, but struct
 * ucontext now has room for all registers which need to be saved and
 * restored.  Coprocessor registers are stored in uc_regspace.  Each
 * coprocessor's saved state should start with a documented 32-bit magic
 * number, followed by a 32-bit word giving the coproccesor's saved size.
 * uc_regspace may be expanded if necessary, although this takes some
 * coordination with glibc.
 */

@benpye
Copy link
Contributor Author

benpye commented Jul 5, 2015

Hm okay, as the Bionic file stated. It still seems glibc is only saving d8-d15 and fpcsr. Isn't that a problem?

VM contains further "va_list ABI abuse". https://github.com/dotnet/coreclr/blob/master/src/vm/clrvarargs.cpp @OtherCrashOverride was ahead here but need to find how to handle this on ARM, I guess Windows uses the same ABI for ARM as other architectures where Linux it changes.

@OtherCrashOverride
Copy link

Posting this link here for future reference
https://msdn.microsoft.com/en-us/library/dn736986.aspx
"Overview of ARM ABI Conventions" (windows 32bit)

Of note is:
"The instruction set for Windows on ARM is strictly limited to Thumb-2."

"For non-variadic functions, the Windows on ARM ABI follows the ARM rules for parameter passing—this includes the VFP and Advanced SIMD extensions."

It does not mention how variadic functions are called, though.

@benpye
Copy link
Contributor Author

benpye commented Jul 6, 2015

Yeah the code indicates Windows has the same pointer to an object for the va_list. On Linux we are going to have to handle the ARM EABI.

@OtherCrashOverride
Copy link

The refernced ARM article:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
"Procedure Call Standard for the ARM® Architecture"

has this to say:
page 31 - "7.2 Argument Passing Conventions"
"For variadic functions, float arguments that match the ellipsis (...) are converted to type double."

page 19 - "5.5 Parameter Passing"
"A variadic function is always marshaled as for the base standard."

page 27 - "7.1.4 Additional Types"

struct __va_list
{
    void *__ap;
}

A va_list may address any object in a parameter list.
Consequently, the first object addressed may only have word
alignment (all objects are at least word aligned), but any double
-word aligned object will appear at the correct double-word
alignment in memory. In C++, __va_list is in namespace
std.

@OtherCrashOverride
Copy link

http://llvm.org/viewvc/llvm-project?view=revision&revision=212004

Basic: correct the va_list type on Windows on ARM

Windows on ARM defines va_list as a typedef for char *. Although the semantics
of argument passing for variadic functions matches AAPCS VFP, the wrapped
struct __va_list type is unused. This makes the intrinsic definition for
va_list match that of Visual Studio.

This could be good news. If I interpret this correctly (and I very well may be wrong)
on Windows ARM:

typedef va_list (char *)

for ARM EABI:

struct __va_list
{
    void *__ap;
}
typedef va_list __va_list

So for a given use of va_list on Windows it can be converted to ARM as:

(__va_list)windows_va_list_value;

and converted back as:

(char *)arm_va_list_value;

@benpye
Copy link
Contributor Author

benpye commented Jul 24, 2015

Just so that the current status of this code is known. State in upstream is non functional, PR dotnet/coreclr#1285 is tracking the unwinding support though it is currently blocked by https://llvm.org/bugs/show_bug.cgi?id=24146 as far as I can tell. If you want to work on the JIT then the code is in a state where that is possible, currently the RyuJIT ARM backend is very incomplete, so that's going to be my main focus in the near future. Outside of the JIT there are likely other issues however without unwinding it's going to be difficult to find them as GC will be non functional.

The legacy JIT backend is functional, however the code path will likely be removed in the near future. It can be used by modifying src/jit/CMakeLists.txt , the elseif(CLR_CMAKE_PLATFORM_ARCH_ARM) section which includes files should be changed to read

  set( ARCH_SOURCES
    targetarm.cpp
    unwindarm.cpp
    emitarm.cpp
    codegenlegacy.cpp
    registerfp.cpp
  )
  add_definitions(-DLEGACY_BACKEND) 

This will make the build use the legacy JIT backend as opposed to the RyuJIT backend. Another workaround I've used to try and test the unwinding is to change src/vm/comdelegate.cpp so that BOOL COMDelegate::NeedsWrapperDelegate(MethodDesc* pTargetMD) returns FALSE even on ARM, this will cause issues down the road and something will need to be done here, there is no secure delegate functionality (which is used for the wrapper delegate), when FEATURE_STUBS_AS_IL is defined, as it is on Unix targets.

@benpye
Copy link
Contributor Author

benpye commented Jul 28, 2015

If anyone on the JIT team could give some indication how I would correctly implement calls for locations outside the 24 bit immediate range it would be of great help. In the legacy code regSet.rsGrabReg(RBM_ALLINT); is used to find a register to use for a branch, however in the new backend, this is not avaliable. The AMD64 code uses REG_DEFAULT_HELPER_CALL_TARGET, but only for helper calls, where for the ARM target it's possible to need this for any call, additionally, the ARM target has no register defined as REG_DEFAULT_HELPER_CALL_TARGET, and I am not entirely sure what would need to be done to use a register for it. Also seems that I'd need to fix the todo regarding killing all callee trashed registers in void CodeGen::genCallInstruction(GenTreePtr node).

cc: @BruceForstall

@BruceForstall
Copy link
Member

I haven't looked at this in quite a while. Presumably for JIT the VM should create jump islands for >24 bit branches (we don't support individual functions greater than that size). However, it doesn't actually look like it does that. Check out IMAGE_REL_BASED_THUMB_BRANCH24. For NGEN (crossgen for .NET Core?) it tries compiling everything without requiring that, then if it needs it, it recompiles (see ZapInfo::getRelocTypeHint). Also see arm_Valid_Imm_For_BL() usage for doing "large" calls.

@benpye
Copy link
Contributor Author

benpye commented Jul 28, 2015

Yeah for non NGEN it seems that arm_Valid_Imm_For_BL() will always return false. The existing code for ARM gets a register, loads the address into the register, and does blx r*, which allows for any branch. I can't see a reason why that wouldn't work now other than allocating said register? I don't think it's branches within a method that have caused the issue, instead those to "helper" methods and JIT_WriteBarrier etc. Does this make sense or am I perhaps missing what you mean, I don't believe it's necessary to generate an island.

@BruceForstall
Copy link
Member

So you are trying to implement the ARM32 back-end in the RyuJIT (non-legacy) path? In that case, you need to "reserve" a register in Lowering::TreeNodeInfoInit(), and then "grab" it using something like genRegNumFromMask(treeNode->gtRsvdRegs). If necessary, maybe you could define REG_DEFAULT_HELPER_CALL_TARGET to some non-argument callee trash register (R12?)

@masonwheeler
Copy link
Contributor

@BruceForstall If doing it that way feels like reimplementing the legacy ARM JIT on top of RyuJIT, what would be "The Right Thing To Do" in order to resolve this problem?

@BruceForstall
Copy link
Member

Sorry, I didn't mean to imply that. I was a little confused from the comments about whether @benpye was trying to get the ARM legacy back end to work better, or trying to start getting the new RyuJIT ARM back-end work started. What I suggest in my last comment is the correct way to go for RyuJIT back-end.

@masonwheeler
Copy link
Contributor

Ah, all right. I guess I misread that.

@benpye
Copy link
Contributor Author

benpye commented Nov 13, 2015

Going through to check build status once again. The new dep on liblttng brings in a library liburcu which is broken on ARM in Trusty with clang. We require version 0.7.15 or later with the commit urcu/userspace-rcu@7a3e2ed for it to build else Clang will error.

@masonwheeler
Copy link
Contributor

So Ubuntu is shipping an outdated library that won't build? That's a bit odd...

@kangaroo
Copy link
Contributor

The ubuntu urcu header won't build with newer clang. It builds just fine with gcc and older clangs.

@benpye
Copy link
Contributor Author

benpye commented Nov 14, 2015

Clang 3.5 is too new however? I suppose Ubuntu 16.04 will be okay by the time that releases but that is some time in the future.

@aviviadi
Copy link

Hi,
I tried building coreclr on Rasperry Pi 2 (armv7l) + Ubuntu 14.04.
Building coreclr and corefx on the Pi were successfully built (urcu patch is no longer required, and the Ubuntu's c compiler had to be upgraded, etc).
mscorlib was created under windows machine (built for "arm").
The helloworld program just contained "return 42". No console stuff of any kind.

Executing corerun with the helloworld program segment faulted..

I'd like to know if and what can be done (or what do we need to wait for) in order to make coreclr run on Armv7 ?

We migrate large scale project into AspNetCore, and we need it not just on Linux64 arch, but also on 32 bit systems, mainly on Arm (and the Pi is one of them).

@shahid-pk
Copy link
Contributor

@aviviadi official support for x86 and arm32 will come after version 1 release. The actual work required for arm32 and x86 is in ryujit. The current work on linux arm32 and aarch64 is mostly from community from @benpye and @kangaroo , but now ryujit for aarch64 is mostly functional. as far as my limited understanding goes.

@aviviadi
Copy link

@shahid-pk I will have to wait then. Thank you for your answer.

@manu-st
Copy link

manu-st commented Mar 16, 2016

@aviviadi I just tried a hello world on a Jetson TK1 board with the latest CoreCLR and it works albeit segfaulting on exit. Make sure you only have mscorlib.dll (no mscorlib.ni.dll generated on the host where you build mscorlib), I had the issue and this caused an immediate segfault.

@aviviadi
Copy link

@manu-silicon Well.. that's interesting. As I understand from @shahid-pk and by reading elsewhere - x86/arm32 support is not planned for the near future. I will have time later to try again (I don't remember having mscorlib.ni.dll but I will give it another go).

@jkotas
Copy link
Member

jkotas commented Jul 7, 2016

The ARM progress has been more recently tracked in https://github.com/dotnet/coreclr/issues/3977

@jkotas jkotas closed this as completed Jul 7, 2016
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants