-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Fixes to compile CUDA with nvc++ #910
Conversation
Just for reference, it appears this is only currently supported on Cray systems at the moment: https://gitlab.kitware.com/cmake/cmake/-/issues/23003. |
Before merging this, I'd like to try compiling Phoebus with nvhpc with this fix on chicoma. Previously, I found that simply removing the relaxed constexpr flag did not allow me to compile with nvc++ but maybe things have changed |
I'm using these modules to build AthenaPK and Parthenon on Chicoma. AthenaPK seems to work correctly across multiple GPU nodes. Parthenon mostly works but isn't passing all tests. Here's my modules:
And I'm compiling Parthenon with
While AthenaPK compiles and passes all tests with this setup, Parthenon compiles but does not pass all ctests
I haven't looked too thoroughly into why these tests fail. |
I am not able to build this with NVHPC 23.5. I get:
I built this with: with a custom wrapper script to get it working on a non-Cray system:
|
There are lots of warnings like this:
|
I get a lot of those warnings too but AthenaPK compiles and runs just fine |
Ok, the "illegal use of void type" looks like it's an internal compiler error in NVHPC 23.5 (https://forums.developer.nvidia.com/t/nvc-openmp-error-inside-llc/178950). |
With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug. I didn't try to run the Parthenon tests. |
I take this back. When running one of the AthenaPK problems (the rand_blast problem), I get different numerical behavior, such that it crashes with a positivity failure at timestep 664:
I don't think this compiler is ready for production use. |
I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable |
I tried again with NVHPC 23.9 and get an internal compiler error when compiling
|
It turns out that nvc++ does not preserve denormals by default (whereas nvcc does). This may explain the weird numerical behavior we were both seeing. You can force it to preserve denormals with the nvc++ compiler option: |
@forrestglines @Yurlungur Even though |
Sounds good to me. The changes are minimal and shouldn't affect any other compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I think it should be considered a compiler bug that it doesn't accept a variable named restrict
, since the usage of restrict
as a reserved keyword is only part of the ISO C standard, and not part of the ISO C++ standard at all. However, changing it avoids confusion in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm willing to merge this, but it does make me a little uncomfortable, see below.
@@ -122,12 +122,12 @@ TaskCollection SparseAdvectionDriver::MakeTaskCollection(BlockList_t &blocks, | |||
mdudt.get(), beta * dt, mc1.get()); | |||
|
|||
// do boundary exchange | |||
auto restrict = | |||
auto restrict_task = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems totally reasonable to change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This var name was changed on develop, so I don't think this is an issue anymore.
# make sure the flag is not added when compiling with Clang for Cuda. The flag | ||
# is also not necessary (and not recognized) when using nvc++ to compile Cuda | ||
# code | ||
if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "NVHPC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me a little nervous, as it may cause problems with downstream codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this condition could be changed to check whether CMAKE_CUDA_COMPILER_ID
is not equal to NVHPC
or Clang
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure---that seems like it might be safer. My main worry here is will the jury-rigged insanity that I (and KHARMA) are currently using to build with nvhpc fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't know. I'm not sure how that works at all without removing this flag, unless you mean you're using nvc++ as the host compiler and nvcc as the device compiler? In that case, I think it should still report NVIDIA
as the CUDA compiler, rather than NVHPC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I feel like we've discussed this before. The option --expt-relaxed-constexpr
is specific to the nvcc
compiler, it's not an option for nvc++
. I'm double checking that nvc++
can compile CUDA code that needs constexpr
.
Does CMAKE_CUDA_COMPILER_ID
get set to NVHPC
or CLANG
? The current commit also only changes the behavior of nvc++
builds, I don't think we want to change the behavior of clang
builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use clang as the device compiler, it doesn't accept this option either (it doesn't need it). So IMO it might be good to keep that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you can force nvc++
to use nvcc
as the CUDA compiler, nvc++
already includes the CUDA toolchain.
I also verified that nvc++
compiles and runs code that needs nvcc --expt-relaxed-constexpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, at least on non-Cray systems, it uses nvcc
as the CUDA compiler. It only uses nvc++ for device code when nvc++ -cuda
(or nvc++ -stdpar
or nvc++ -openacc
) is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the full context for this discussion - is this issue resolved?
@forrestglines just wondering if you've tried to build Parthenon with nvc++ recently? |
@BenWibking Sorry for the delay, I've been on and off travel, other projects, and vacation next week. Yes, I've been using
Ideally in the future it will be the best supported compiler for NVIDIA hardware. |
Yeah, I got that with 23.9 on x86-64 as well. I would be very interested to know whether it gives the same error with the current internal version of nvhpc. I have gotten a lot of ICEs with nvhpc over the years. With the Cray compiler, it tells you to report the error to Cray and gives you the full command line to output the fully-preprocessed source it is trying to compile that can be then be easily |
@fglines-nv Just wondering if you have an update on this? |
Not yet, I'm working this week on a dockerfile to regularly build Parthenon with |
It looks like a special flag is now needed to get Kokkos to allow compiling with NVHPC: kokkos/kokkos#6499 |
A few updates on the ICE -- it turns out that using
This change disables compiling CUDA code in Kokkos using To compile with NVHPC, we should instead compile with
I'm still getting an error with
but I think this is unrelated to NVHPC |
@fglines-nv Is there a compelling reason to compile with NVHPC as the host compiler, if the device code is still compiled with |
I'm not sure yet. As far as I've seen, no. It might have benefits for compiling for Grace/ARM |
The error in If anyone wants to compile with NVHPC as the host compiler, use:
The changes in this branch are unnecessary for that. |
PR Summary
Quick fix to the cmake to allow compiling CUDA code with the
nvc++
. Although in past experiencenvc++
hasn't worked for Kokkos based codes, supposedly it will become NVIDIA's favored compiler andnvcc
deprecated.While
nvcc
is likely preferred,nvc++
and thenvhpc
stack turned out to be easier to compile AthenaPK on Chicoma at the moment.PR Checklist