WIP: Fixes to compile CUDA with nvc++ #910

forrestglines · 2023-07-15T16:56:17Z

PR Summary

Quick fix to the cmake to allow compiling CUDA code with the nvc++. Although in past experience nvc++ hasn't worked for Kokkos based codes, supposedly it will become NVIDIA's favored compiler and nvcc deprecated.

While nvcc is likely preferred, nvc++ and the nvhpc stack turned out to be easier to compile AthenaPK on Chicoma at the moment.

PR Checklist

Code passes cpplint
New features are documented.
Adds a test for any bugs fixed. Adds tests for new features.
Code is formatted
Changes are summarized in CHANGELOG.md
CI has been triggered on Darwin for performance regression tests.
Docs build
(@lanl.gov employees) Update copyright on changed files

BenWibking · 2023-07-15T21:01:28Z

Just for reference, it appears this is only currently supported on Cray systems at the moment: https://gitlab.kitware.com/cmake/cmake/-/issues/23003.

Yurlungur · 2023-07-16T22:57:59Z

Before merging this, I'd like to try compiling Phoebus with nvhpc with this fix on chicoma. Previously, I found that simply removing the relaxed constexpr flag did not allow me to compile with nvc++ but maybe things have changed

forrestglines · 2023-07-17T15:28:22Z

I'm using these modules to build AthenaPK and Parthenon on Chicoma. AthenaPK seems to work correctly across multiple GPU nodes. Parthenon mostly works but isn't passing all tests.

Here's my modules:

module purge
module load PrgEnv-nvhpc
module load nvhpc/22.3  cray-mpich/8.1.21 craype-accel-nvidia80
module load cmake/3.22.3 python/3.10-anaconda-2023.03  cray-hdf5-parallel/1.12.2.1
export MPICH_GPU_SUPPORT_ENABLED=1

And I'm compiling Parthenon with

cmake \
  -D CMAKE_BUILD_TYPE=RelWithDebInfo \
  -D Kokkos_ENABLE_CUDA=ON -D Kokkos_ENABLE_CUDA_LAMBDA=ON \
  -D Kokkos_ARCH_AMPERE80=ON \
  -D SERIAL_WITH_MPIEXEC=ON -D TEST_MPIEXEC=srun \
  -D CMAKE_CXX_FLAGS="${PE_MPICH_GTL_DIR_nvidia80} ${PE_MPICH_GTL_LIBS_nvidia80}" \
  $SOURCE_DIR

While AthenaPK compiles and passes all tests with this setup, Parthenon compiles but does not pass all ctests

The following tests FAILED:
         16 - Device Object Allocation (Subprocess aborted)
         30 - MeshData works as expected for simple packs (Subprocess aborted)
         33 - Swarm memory management (Subprocess aborted)
         49 - regression_test:particle_leapfrog (Failed)
         50 - regression_mpi_test:particle_leapfrog (Failed)
         51 - regression_test:particle_leapfrog_outflow (Failed)
         52 - regression_mpi_test:particle_leapfrog_outflow (Failed)
         58 - regression_mpi_test:advection_convergence (Failed)
         59 - regression_test:output_hdf5 (Failed)
         60 - regression_mpi_test:output_hdf5 (Failed)
         63 - regression_test:bvals (Failed)
         64 - regression_mpi_test:bvals (Failed)
Errors while running CTest

I haven't looked too thoroughly into why these tests fail.

BenWibking · 2023-07-17T18:25:52Z

I am not able to build this with NVHPC 23.5. I get:

NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/flux_correction.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/boundary_communication.cpp.o] Error 2

I built this with:
cmake .. -DKokkos_ARCH_AMPERE80=ON -DKokkos_ENABLE_CUDA=ON -DCMAKE_C_COMPILER=nvc -DCMAKE_CXX_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER_ID=NVHPC

with a custom wrapper script to get it working on a non-Cray system:

#!/bin/bash
nvc++ -cuda -gpu=cc80 --gcc-toolchain=$(which gcc) "$@"

BenWibking · 2023-07-17T18:27:18Z

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

forrestglines · 2023-07-17T18:34:44Z

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

I get a lot of those warnings too but AthenaPK compiles and runs just fine

BenWibking · 2023-07-17T19:04:26Z

Ok, the "illegal use of void type" looks like it's an internal compiler error in NVHPC 23.5 (https://forums.developer.nvidia.com/t/nvc-openmp-error-inside-llc/178950).

BenWibking · 2023-07-17T19:28:37Z

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

BenWibking · 2023-07-17T20:04:27Z

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

I take this back. When running one of the AthenaPK problems (the rand_blast problem), I get different numerical behavior, such that it crashes with a positivity failure at timestep 664:

Mon Jul 17 14:48:42 CDT 2023:  cycle=664 time=6.2577120848689291e-03 dt=1.0307579117559152e-05 zone-cycles/wsec_step=2.75e+08 wsec_total=8.08e+00 wsec_step=7.61e-03
Mon Jul 17 14:48:42 CDT 2023:  ### PARTHENON ERROR
Mon Jul 17 14:48:42 CDT 2023:    Condition:   w_p > 0.0 || pressure_floor_ > 0.0 || e_floor_ > 0.0
Mon Jul 17 14:48:42 CDT 2023:    Message:     Got negative pressure. Consider enabling first-order flux correction or setting a reasonble pressure or temperature floor.
Mon Jul 17 14:48:42 CDT 2023:    File:        /projects/cvz/bwibking/athenapk-nvhpc-test/src/eos/../eos/adiabatic_glmmhd.hpp
Mon Jul 17 14:48:42 CDT 2023:    Line number: 116

I don't think this compiler is ready for production use.

forrestglines · 2023-07-17T23:11:06Z

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

BenWibking · 2023-10-13T18:51:16Z

I tried again with NVHPC 23.9 and get an internal compiler error when compiling bnd_info.cpp:

[ 13%] Building CXX object parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o
"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 309: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
        int idx = static_cast<int>(el) % 3;
            ^

Remark: individual warnings can be suppressed with "--diag_suppress <warning-name>"

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 349: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
      int idx = static_cast<int>(el) % 3;
          ^

NVC++-F-0000-Internal compiler error. size of unknown type       0  (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/x86-64 Linux 23.9-0: compilation aborted

BenWibking · 2023-10-15T18:29:00Z

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

It turns out that nvc++ does not preserve denormals by default (whereas nvcc does). This may explain the weird numerical behavior we were both seeing. You can force it to preserve denormals with the nvc++ compiler option: -⁠Mnodaz.

BenWibking · 2023-11-19T18:33:54Z

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

forrestglines · 2023-11-19T20:54:47Z

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

Sounds good to me. The changes are minimal and shouldn't affect any other compiler.

BenWibking

Looks good to me.

I think it should be considered a compiler bug that it doesn't accept a variable named restrict, since the usage of restrict as a reserved keyword is only part of the ISO C standard, and not part of the ISO C++ standard at all. However, changing it avoids confusion in any case.

Yurlungur

I'm willing to merge this, but it does make me a little uncomfortable, see below.

Yurlungur · 2023-11-20T03:33:35Z

example/sparse_advection/sparse_advection_driver.cpp

@@ -122,12 +122,12 @@ TaskCollection SparseAdvectionDriver::MakeTaskCollection(BlockList_t &blocks,
                             mdudt.get(), beta * dt, mc1.get());

    // do boundary exchange
-    auto restrict =
+    auto restrict_task =


seems totally reasonable to change this

This var name was changed on develop, so I don't think this is an issue anymore.

Yurlungur · 2023-11-20T03:34:27Z

src/CMakeLists.txt

+# make sure the flag is not added when compiling with Clang for Cuda. The flag
+# is also not necessary (and not recognized) when using nvc++ to compile Cuda
+# code
+if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "NVHPC")


This makes me a little nervous, as it may cause problems with downstream codes.

Maybe this condition could be changed to check whether CMAKE_CUDA_COMPILER_ID is not equal to NVHPC or Clang instead?

Sure---that seems like it might be safer. My main worry here is will the jury-rigged insanity that I (and KHARMA) are currently using to build with nvhpc fail?

Hmm, I don't know. I'm not sure how that works at all without removing this flag, unless you mean you're using nvc++ as the host compiler and nvcc as the device compiler? In that case, I think it should still report NVIDIA as the CUDA compiler, rather than NVHPC.

I don't think so. I feel like we've discussed this before. The option --expt-relaxed-constexpr is specific to the nvcc compiler, it's not an option for nvc++. I'm double checking that nvc++ can compile CUDA code that needs constexpr.

Does CMAKE_CUDA_COMPILER_ID get set to NVHPC or CLANG? The current commit also only changes the behavior of nvc++ builds, I don't think we want to change the behavior of clang builds.

If you use clang as the device compiler, it doesn't accept this option either (it doesn't need it). So IMO it might be good to keep that.

I'm not sure if you can force nvc++ to use nvcc as the CUDA compiler, nvc++ already includes the CUDA toolchain.

I also verified that nvc++ compiles and runs code that needs nvcc --expt-relaxed-constexpr.

By default, at least on non-Cray systems, it uses nvcc as the CUDA compiler. It only uses nvc++ for device code when nvc++ -cuda (or nvc++ -stdpar or nvc++ -openacc) is used.

I don't remember the full context for this discussion - is this issue resolved?

BenWibking · 2024-06-06T15:56:09Z

@forrestglines just wondering if you've tried to build Parthenon with nvc++ recently?

fglines-nv · 2024-06-28T22:14:58Z

@BenWibking Sorry for the delay, I've been on and off travel, other projects, and vacation next week.

Yes, I've been using nvhpc for Grace-Grace since it can give better performance on aarch64. I've returned to NVHPC on GPUs since Venado is starting to become available. I'm getting internal compiler errors there however so I might shelve it until the 9th.

NVC++-F-0000-Internal compiler error. size of unknown type       0  (/users/forrestglines/code/parthenon-project/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/arm Linux 23.9-0: compilation aborted
make[2]: *** [src/CMakeFiles/parthenon.dir/build.make:88: src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:1837: src/CMakeFiles/parthenon.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Ideally in the future it will be the best supported compiler for NVIDIA hardware.

BenWibking · 2024-06-29T03:00:51Z

@BenWibking Sorry for the delay, I've been on and off travel, other projects, and vacation next week.

Yes, I've been using nvhpc for Grace-Grace since it can give better performance on aarch64. I've returned to NVHPC on GPUs since Venado is starting to become available. I'm getting internal compiler errors there however so I might shelve it until the 9th.
NVC++-F-0000-Internal compiler error. size of unknown type       0  (/users/forrestglines/code/parthenon-project/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/arm Linux 23.9-0: compilation aborted
make[2]: *** [src/CMakeFiles/parthenon.dir/build.make:88: src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:1837: src/CMakeFiles/parthenon.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
Ideally in the future it will be the best supported compiler for NVIDIA hardware.

Yeah, I got that with 23.9 on x86-64 as well. I would be very interested to know whether it gives the same error with the current internal version of nvhpc.

I have gotten a lot of ICEs with nvhpc over the years. With the Cray compiler, it tells you to report the error to Cray and gives you the full command line to output the fully-preprocessed source it is trying to compile that can be then be easily tar'd and emailed to them. I would be happy to report my issues if it were that easy with nvhpc.

BenWibking · 2024-09-19T17:40:56Z

@fglines-nv Just wondering if you have an update on this?

fglines-nv · 2024-09-24T21:45:01Z

@fglines-nv Just wondering if you have an update on this?

Not yet, I'm working this week on a dockerfile to regularly build Parthenon with nvhpc

BenWibking · 2024-09-28T01:55:16Z

It looks like a special flag is now needed to get Kokkos to allow compiling with NVHPC: kokkos/kokkos#6499

fglines-nv · 2024-10-01T15:49:20Z

A few updates on the ICE -- it turns out that using nvc++ -cuda to compile Kokkos projects (and perhaps most CUDA + C++ code outside of internal projects) is largely unsupported. There's no plan to fix this particular ICE at the moment

It looks like a special flag is now needed to get Kokkos to allow compiling with NVHPC: kokkos/kokkos#6499

This change disables compiling CUDA code in Kokkos using nvc++ -cuda unless you enable it explicitly. Since Parthenon's Kokkos is from May 2023 it doesn't yet include this check.

To compile with NVHPC, we should instead compile with nvcc_wrapper using nvc++ as the host compiler. I.E.:

export NVCC_WRAPPER_DEFAULT_COMPILER=nvc++
cmake   -DCMAKE_CXX_COMPILER=$SOURCE_DIR/external/Kokkos/bin/nvcc_wrapper ...

I'm still getting an error with

particle_leapfrog.cpp(197): error: identifier "particles_leapfrog::particles_ic" is undefined in device code

but I think this is unrelated to NVHPC

BenWibking · 2024-10-01T16:13:36Z

@fglines-nv Is there a compelling reason to compile with NVHPC as the host compiler, if the device code is still compiled with nvcc?

fglines-nv · 2024-10-01T17:31:48Z

@fglines-nv Is there a compelling reason to compile with NVHPC as the host compiler, if the device code is still compiled with nvcc?

I'm not sure yet. As far as I've seen, no. It might have benefits for compiling for Grace/ARM

fglines-nv · 2024-10-01T17:51:17Z

The error in particle_leapfrog.cpp is a different issue unrelated to NVHPC but present with at least CUDA 12.5+. I'll file another bug about that.

If anyone wants to compile with NVHPC as the host compiler, use:

export NVCC_WRAPPER_DEFAULT_COMPILER=nvc++
cmake   -DCMAKE_CXX_COMPILER=$SOURCE_DIR/external/Kokkos/bin/nvcc_wrapper ...

The changes in this branch are unnecessary for that.

Forrest Glines added 2 commits July 15, 2023 10:07

Skip --expt-relaxed-constexpr with nvc++/nvhpc

2bbce53

Rename restrict to non C-keyword

81ba353

Merge branch 'develop' into forrestglines/cuda-with-nvc++-fix

dacc2a1

forrestglines requested review from Yurlungur and BenWibking November 19, 2023 20:55

BenWibking approved these changes Nov 19, 2023

View reviewed changes

Yurlungur approved these changes Nov 20, 2023

View reviewed changes

Yurlungur and others added 2 commits November 20, 2023 09:49

Merge branch 'develop' into forrestglines/cuda-with-nvc++-fix

8657717

Merge branch 'develop' into forrestglines/cuda-with-nvc++-fix

c48f557

Merge branch 'develop' into forrestglines/cuda-with-nvc++-fix

9d6fdc6

fglines-nv closed this Oct 1, 2024

brryan mentioned this pull request Oct 1, 2024

Fix particle leapfrog example initialization data #1183

Merged

12 tasks

WIP: Fixes to compile CUDA with nvc++ #910

WIP: Fixes to compile CUDA with nvc++ #910

Conversation

forrestglines commented Jul 15, 2023

PR Summary

PR Checklist

BenWibking commented Jul 15, 2023

Yurlungur commented Jul 16, 2023 • edited Loading

forrestglines commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

forrestglines commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

BenWibking commented Jul 17, 2023

forrestglines commented Jul 17, 2023

BenWibking commented Oct 13, 2023 • edited Loading

BenWibking commented Oct 15, 2023

BenWibking commented Nov 19, 2023

forrestglines commented Nov 19, 2023

BenWibking left a comment • edited Loading

Choose a reason for hiding this comment

Yurlungur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenWibking Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenWibking commented Jun 6, 2024

fglines-nv commented Jun 28, 2024

BenWibking commented Jun 29, 2024

BenWibking commented Sep 19, 2024

fglines-nv commented Sep 24, 2024

BenWibking commented Sep 28, 2024

fglines-nv commented Oct 1, 2024

BenWibking commented Oct 1, 2024

fglines-nv commented Oct 1, 2024

fglines-nv commented Oct 1, 2024

Yurlungur commented Jul 16, 2023 •

edited

Loading

BenWibking commented Oct 13, 2023 •

edited

Loading

BenWibking left a comment •

edited

Loading

BenWibking Nov 20, 2023 •

edited

Loading