Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fixes to compile CUDA with nvc++ #910

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions example/sparse_advection/sparse_advection_driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -122,12 +122,12 @@ TaskCollection SparseAdvectionDriver::MakeTaskCollection(BlockList_t &blocks,
mdudt.get(), beta * dt, mc1.get());

// do boundary exchange
auto restrict =
auto restrict_task =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems totally reasonable to change this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This var name was changed on develop, so I don't think this is an issue anymore.

parthenon::AddBoundaryExchangeTasks(update, tl, mc1, pmesh->multilevel);

// if this is the last stage, check if we can deallocate any sparse variables
if (stage == integrator->nstages) {
tl.AddTask(restrict, SparseDealloc, mc1.get());
tl.AddTask(restrict_task, SparseDealloc, mc1.get());
}
}

Expand Down
8 changes: 5 additions & 3 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -301,9 +301,11 @@ endif()
# However, Parthenon heavily relies on it and there is no harm in compiling Kokkos
# without and Parthenon with (via Max Katz on the Kokkos Slack channel).
# Therefore, we don't use the Kokkos_ENABLE_CUDA_CONSTEXPR option add the flag manually.
# Also, not checking for NVIDIA as nvcc_wrapper is identified as GNU so we just make sure
# the flag is not added when compiling with Clang for Cuda.
if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
# Also, not checking for NVIDIA as nvcc_wrapper is identified as GNU so we just
# make sure the flag is not added when compiling with Clang for Cuda. The flag
# is also not necessary (and not recognized) when using nvc++ to compile Cuda
# code
if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "NVHPC")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a little nervous, as it may cause problems with downstream codes.

Copy link
Collaborator

@BenWibking BenWibking Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this condition could be changed to check whether CMAKE_CUDA_COMPILER_ID is not equal to NVHPC or Clang instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure---that seems like it might be safer. My main worry here is will the jury-rigged insanity that I (and KHARMA) are currently using to build with nvhpc fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know. I'm not sure how that works at all without removing this flag, unless you mean you're using nvc++ as the host compiler and nvcc as the device compiler? In that case, I think it should still report NVIDIA as the CUDA compiler, rather than NVHPC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I feel like we've discussed this before. The option --expt-relaxed-constexpr is specific to the nvcc compiler, it's not an option for nvc++. I'm double checking that nvc++ can compile CUDA code that needs constexpr.

Does CMAKE_CUDA_COMPILER_ID get set to NVHPC or CLANG? The current commit also only changes the behavior of nvc++ builds, I don't think we want to change the behavior of clang builds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use clang as the device compiler, it doesn't accept this option either (it doesn't need it). So IMO it might be good to keep that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you can force nvc++ to use nvcc as the CUDA compiler, nvc++ already includes the CUDA toolchain.

I also verified that nvc++ compiles and runs code that needs nvcc --expt-relaxed-constexpr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, at least on non-Cray systems, it uses nvcc as the CUDA compiler. It only uses nvc++ for device code when nvc++ -cuda (or nvc++ -stdpar or nvc++ -openacc) is used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the full context for this discussion - is this issue resolved?

target_compile_options(parthenon PUBLIC --expt-relaxed-constexpr)
endif()

Expand Down
Loading