Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fixes to compile CUDA with nvc++ #910

Closed
wants to merge 6 commits into from

Conversation

forrestglines
Copy link
Collaborator

PR Summary

Quick fix to the cmake to allow compiling CUDA code with the nvc++. Although in past experience nvc++ hasn't worked for Kokkos based codes, supposedly it will become NVIDIA's favored compiler and nvcc deprecated.

While nvcc is likely preferred, nvc++ and the nvhpc stack turned out to be easier to compile AthenaPK on Chicoma at the moment.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@BenWibking
Copy link
Collaborator

Just for reference, it appears this is only currently supported on Cray systems at the moment: https://gitlab.kitware.com/cmake/cmake/-/issues/23003.

@Yurlungur
Copy link
Collaborator

Yurlungur commented Jul 16, 2023

Before merging this, I'd like to try compiling Phoebus with nvhpc with this fix on chicoma. Previously, I found that simply removing the relaxed constexpr flag did not allow me to compile with nvc++ but maybe things have changed

@forrestglines
Copy link
Collaborator Author

I'm using these modules to build AthenaPK and Parthenon on Chicoma. AthenaPK seems to work correctly across multiple GPU nodes. Parthenon mostly works but isn't passing all tests.

Here's my modules:

module purge
module load PrgEnv-nvhpc
module load nvhpc/22.3  cray-mpich/8.1.21 craype-accel-nvidia80
module load cmake/3.22.3 python/3.10-anaconda-2023.03  cray-hdf5-parallel/1.12.2.1
export MPICH_GPU_SUPPORT_ENABLED=1

And I'm compiling Parthenon with

cmake \
  -D CMAKE_BUILD_TYPE=RelWithDebInfo \
  -D Kokkos_ENABLE_CUDA=ON -D Kokkos_ENABLE_CUDA_LAMBDA=ON \
  -D Kokkos_ARCH_AMPERE80=ON \
  -D SERIAL_WITH_MPIEXEC=ON -D TEST_MPIEXEC=srun \
  -D CMAKE_CXX_FLAGS="${PE_MPICH_GTL_DIR_nvidia80} ${PE_MPICH_GTL_LIBS_nvidia80}" \
  $SOURCE_DIR

While AthenaPK compiles and passes all tests with this setup, Parthenon compiles but does not pass all ctests

The following tests FAILED:
         16 - Device Object Allocation (Subprocess aborted)
         30 - MeshData works as expected for simple packs (Subprocess aborted)
         33 - Swarm memory management (Subprocess aborted)
         49 - regression_test:particle_leapfrog (Failed)
         50 - regression_mpi_test:particle_leapfrog (Failed)
         51 - regression_test:particle_leapfrog_outflow (Failed)
         52 - regression_mpi_test:particle_leapfrog_outflow (Failed)
         58 - regression_mpi_test:advection_convergence (Failed)
         59 - regression_test:output_hdf5 (Failed)
         60 - regression_mpi_test:output_hdf5 (Failed)
         63 - regression_test:bvals (Failed)
         64 - regression_mpi_test:bvals (Failed)
Errors while running CTest

I haven't looked too thoroughly into why these tests fail.

@BenWibking
Copy link
Collaborator

I am not able to build this with NVHPC 23.5. I get:

NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/flux_correction.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++-S-0053-Illegal use of void type (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/boundary_communication.cpp)
NVC++/x86-64 Linux 23.5-0: compilation completed with severe errors
make[2]: *** [parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/boundary_communication.cpp.o] Error 2

I built this with:
cmake .. -DKokkos_ARCH_AMPERE80=ON -DKokkos_ENABLE_CUDA=ON -DCMAKE_C_COMPILER=nvc -DCMAKE_CXX_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER=$HOME/bin/CC -DCMAKE_CUDA_COMPILER_ID=NVHPC

with a custom wrapper script to get it working on a non-Cray system:

#!/bin/bash
nvc++ -cuda -gpu=cc80 --gcc-toolchain=$(which gcc) "$@"

@BenWibking
Copy link
Collaborator

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

@forrestglines
Copy link
Collaborator Author

There are lots of warnings like this:

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 84: warning: function "lambda []" captures local object "koff" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
              Kokkos::TeamThreadRange<>(team_member, idxer.size()), [&](const int idx) {
                                                                       ^

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/flux_correction.cpp", line 180: warning: function "lambda []" captures local object "b" by reference, will likely cause an illegal memory access when run on the device [no_device_stack]
                               [&](const int idx) {
                                  ^

I get a lot of those warnings too but AthenaPK compiles and runs just fine

@BenWibking
Copy link
Collaborator

Ok, the "illegal use of void type" looks like it's an internal compiler error in NVHPC 23.5 (https://forums.developer.nvidia.com/t/nvc-openmp-error-inside-llc/178950).

@BenWibking
Copy link
Collaborator

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

@BenWibking
Copy link
Collaborator

I get a lot of those warnings too but AthenaPK compiles and runs just fine

With NVHPC 22.3 on NCSA Delta, I am able to build and run AthenaPK successfully. It looks like NVHPC 23.5 just fails due to a compiler bug.

I didn't try to run the Parthenon tests.

I take this back. When running one of the AthenaPK problems (the rand_blast problem), I get different numerical behavior, such that it crashes with a positivity failure at timestep 664:

Mon Jul 17 14:48:42 CDT 2023:  cycle=664 time=6.2577120848689291e-03 dt=1.0307579117559152e-05 zone-cycles/wsec_step=2.75e+08 wsec_total=8.08e+00 wsec_step=7.61e-03
Mon Jul 17 14:48:42 CDT 2023:  ### PARTHENON ERROR
Mon Jul 17 14:48:42 CDT 2023:    Condition:   w_p > 0.0 || pressure_floor_ > 0.0 || e_floor_ > 0.0
Mon Jul 17 14:48:42 CDT 2023:    Message:     Got negative pressure. Consider enabling first-order flux correction or setting a reasonble pressure or temperature floor.
Mon Jul 17 14:48:42 CDT 2023:    File:        /projects/cvz/bwibking/athenapk-nvhpc-test/src/eos/../eos/adiabatic_glmmhd.hpp
Mon Jul 17 14:48:42 CDT 2023:    Line number: 116

I don't think this compiler is ready for production use.

@forrestglines
Copy link
Collaborator Author

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

@BenWibking
Copy link
Collaborator

BenWibking commented Oct 13, 2023

I tried again with NVHPC 23.9 and get an internal compiler error when compiling bnd_info.cpp:

[ 13%] Building CXX object parthenon/src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o
"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 309: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
        int idx = static_cast<int>(el) % 3;
            ^

Remark: individual warnings can be suppressed with "--diag_suppress <warning-name>"

"/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp", line 349: warning: variable "idx" was declared but never referenced [declared_but_not_referenced]
      int idx = static_cast<int>(el) % 3;
          ^

NVC++-F-0000-Internal compiler error. size of unknown type       0  (/mnt/home/wibkingb/athenapk/external/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/x86-64 Linux 23.9-0: compilation aborted

@BenWibking
Copy link
Collaborator

I also noticed that with a magnetic jet problem I was running, my NVHPC build failed at a slightly different time than my clang CPU build. I think I'm going to also give up on nvc++ for now, too unreliable

It turns out that nvc++ does not preserve denormals by default (whereas nvcc does). This may explain the weird numerical behavior we were both seeing. You can force it to preserve denormals with the nvc++ compiler option: -⁠Mnodaz.

@BenWibking
Copy link
Collaborator

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

@forrestglines
Copy link
Collaborator Author

@forrestglines @Yurlungur Even though nvc++ -cuda can't currently compile Parthenon, I would be in favor of merging this in, since this PR will eventually be necessary once the compiler works.

Sounds good to me. The changes are minimal and shouldn't affect any other compiler.

Copy link
Collaborator

@BenWibking BenWibking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I think it should be considered a compiler bug that it doesn't accept a variable named restrict, since the usage of restrict as a reserved keyword is only part of the ISO C standard, and not part of the ISO C++ standard at all. However, changing it avoids confusion in any case.

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to merge this, but it does make me a little uncomfortable, see below.

@@ -122,12 +122,12 @@ TaskCollection SparseAdvectionDriver::MakeTaskCollection(BlockList_t &blocks,
mdudt.get(), beta * dt, mc1.get());

// do boundary exchange
auto restrict =
auto restrict_task =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems totally reasonable to change this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This var name was changed on develop, so I don't think this is an issue anymore.

# make sure the flag is not added when compiling with Clang for Cuda. The flag
# is also not necessary (and not recognized) when using nvc++ to compile Cuda
# code
if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "NVHPC")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a little nervous, as it may cause problems with downstream codes.

Copy link
Collaborator

@BenWibking BenWibking Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this condition could be changed to check whether CMAKE_CUDA_COMPILER_ID is not equal to NVHPC or Clang instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure---that seems like it might be safer. My main worry here is will the jury-rigged insanity that I (and KHARMA) are currently using to build with nvhpc fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know. I'm not sure how that works at all without removing this flag, unless you mean you're using nvc++ as the host compiler and nvcc as the device compiler? In that case, I think it should still report NVIDIA as the CUDA compiler, rather than NVHPC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I feel like we've discussed this before. The option --expt-relaxed-constexpr is specific to the nvcc compiler, it's not an option for nvc++. I'm double checking that nvc++ can compile CUDA code that needs constexpr.

Does CMAKE_CUDA_COMPILER_ID get set to NVHPC or CLANG? The current commit also only changes the behavior of nvc++ builds, I don't think we want to change the behavior of clang builds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use clang as the device compiler, it doesn't accept this option either (it doesn't need it). So IMO it might be good to keep that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you can force nvc++ to use nvcc as the CUDA compiler, nvc++ already includes the CUDA toolchain.

I also verified that nvc++ compiles and runs code that needs nvcc --expt-relaxed-constexpr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, at least on non-Cray systems, it uses nvcc as the CUDA compiler. It only uses nvc++ for device code when nvc++ -cuda (or nvc++ -stdpar or nvc++ -openacc) is used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the full context for this discussion - is this issue resolved?

@BenWibking
Copy link
Collaborator

@forrestglines just wondering if you've tried to build Parthenon with nvc++ recently?

@fglines-nv
Copy link
Collaborator

@BenWibking Sorry for the delay, I've been on and off travel, other projects, and vacation next week.

Yes, I've been using nvhpc for Grace-Grace since it can give better performance on aarch64. I've returned to NVHPC on GPUs since Venado is starting to become available. I'm getting internal compiler errors there however so I might shelve it until the 9th.

NVC++-F-0000-Internal compiler error. size of unknown type       0  (/users/forrestglines/code/parthenon-project/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/arm Linux 23.9-0: compilation aborted
make[2]: *** [src/CMakeFiles/parthenon.dir/build.make:88: src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:1837: src/CMakeFiles/parthenon.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Ideally in the future it will be the best supported compiler for NVIDIA hardware.

@BenWibking
Copy link
Collaborator

@BenWibking Sorry for the delay, I've been on and off travel, other projects, and vacation next week.

Yes, I've been using nvhpc for Grace-Grace since it can give better performance on aarch64. I've returned to NVHPC on GPUs since Venado is starting to become available. I'm getting internal compiler errors there however so I might shelve it until the 9th.

NVC++-F-0000-Internal compiler error. size of unknown type       0  (/users/forrestglines/code/parthenon-project/parthenon/src/bvals/comms/bnd_info.cpp)
NVC++/arm Linux 23.9-0: compilation aborted
make[2]: *** [src/CMakeFiles/parthenon.dir/build.make:88: src/CMakeFiles/parthenon.dir/bvals/comms/bnd_info.cpp.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:1837: src/CMakeFiles/parthenon.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Ideally in the future it will be the best supported compiler for NVIDIA hardware.

Yeah, I got that with 23.9 on x86-64 as well. I would be very interested to know whether it gives the same error with the current internal version of nvhpc.

I have gotten a lot of ICEs with nvhpc over the years. With the Cray compiler, it tells you to report the error to Cray and gives you the full command line to output the fully-preprocessed source it is trying to compile that can be then be easily tar'd and emailed to them. I would be happy to report my issues if it were that easy with nvhpc.

@BenWibking
Copy link
Collaborator

@fglines-nv Just wondering if you have an update on this?

@fglines-nv
Copy link
Collaborator

@fglines-nv Just wondering if you have an update on this?

Not yet, I'm working this week on a dockerfile to regularly build Parthenon with nvhpc

@BenWibking
Copy link
Collaborator

It looks like a special flag is now needed to get Kokkos to allow compiling with NVHPC: kokkos/kokkos#6499

@fglines-nv
Copy link
Collaborator

A few updates on the ICE -- it turns out that using nvc++ -cuda to compile Kokkos projects (and perhaps most CUDA + C++ code outside of internal projects) is largely unsupported. There's no plan to fix this particular ICE at the moment

It looks like a special flag is now needed to get Kokkos to allow compiling with NVHPC: kokkos/kokkos#6499

This change disables compiling CUDA code in Kokkos using nvc++ -cuda unless you enable it explicitly. Since Parthenon's Kokkos is from May 2023 it doesn't yet include this check.

To compile with NVHPC, we should instead compile with nvcc_wrapper using nvc++ as the host compiler. I.E.:

export NVCC_WRAPPER_DEFAULT_COMPILER=nvc++
cmake   -DCMAKE_CXX_COMPILER=$SOURCE_DIR/external/Kokkos/bin/nvcc_wrapper ...

I'm still getting an error with

particle_leapfrog.cpp(197): error: identifier "particles_leapfrog::particles_ic" is undefined in device code                                                           

but I think this is unrelated to NVHPC

@BenWibking
Copy link
Collaborator

@fglines-nv Is there a compelling reason to compile with NVHPC as the host compiler, if the device code is still compiled with nvcc?

@fglines-nv
Copy link
Collaborator

@fglines-nv Is there a compelling reason to compile with NVHPC as the host compiler, if the device code is still compiled with nvcc?

I'm not sure yet. As far as I've seen, no. It might have benefits for compiling for Grace/ARM

@fglines-nv
Copy link
Collaborator

The error in particle_leapfrog.cpp is a different issue unrelated to NVHPC but present with at least CUDA 12.5+. I'll file another bug about that.

If anyone wants to compile with NVHPC as the host compiler, use:

export NVCC_WRAPPER_DEFAULT_COMPILER=nvc++
cmake   -DCMAKE_CXX_COMPILER=$SOURCE_DIR/external/Kokkos/bin/nvcc_wrapper ...

The changes in this branch are unnecessary for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants