Integrate changes from NERSC GPU hackathon. #713

olupton · 2021-12-13T08:58:32Z

Description
During the NERSC GPU hackathon we used hackathon_main as our "base" branch.
Now the hackathon is over, we should merge our developments into master, after re-enabling the full test suite and fixing any issues we delayed addressing during the hackathon.

See also:
neuronsimulator/gpuhackathon#4

Summary of hackathon developments:

Support OpenMP target offload when NMODL and GPU support are enabled.
Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU.
Use Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation.
Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums.
Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload.
Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP.
Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc.

TODO:

Merge Integrate changes from NERSC GPU hackathon. nmodl#783
Update NMODL submodule once Integrate changes from NERSC GPU hackathon. nmodl#783 is merged.

Use certain branches for the SimulationStack CI
CI_BRANCHES:NEURON_BRANCH=master,NMODL_BRANCH=master,

* Disable cmake-format and clang-format checks. * Disable GitLab CI except for NMODL + GPU.

* Add a hackathon-specific argument for benchmarks. * Add a reference comparison for channel-benchmark.

* create build/benchmark folder before trying to use it * run nrnivmodl-core in parallel than serially (too slow)

…umber and update the related documentation (#700)

* Add memory pool for Random123 streams. This speeds up initialisation when running on GPU. * Make Boost optional.

This was a silly bug in #702.

* Simplify unified memory logic. * Pass -mp=gpu when we pass -acc * Pass -gpu=lineinfo for better debug information. * Pass -Minfo=accel,mp for better compile time diagnostics. * Add nrn_pragma_{acc,omp} macros for single-source Open{ACC,MP} support. * Call omp_set_default_device. * Drop cc60 because of OpenMP offload incompatibility. * Add --gpu to test. * Default (BB5-valid) CORENRN_EXTERNAL_BENCHMARK_DATA. * Remove cuda_add_library. * Don't print number of GPUs when quiet. * Set OMP_NUM_THREADS=1 for lfp_test. * Update NMODL to emit nrn_pragma{acc,omp} macros. Co-authored-by: Pramod Kumbhar <[email protected]>

* Add wrapper functions for using OpenMP or OpenACC API * Add -mp=gpu in order to link gpu runtime with tests as well * Avoid copying VecPlay members twice otherwise association fails with OpenMP * IvocVect members t_ and y_ were copied twice * only discon_indices_ is pointer and hence that needs to be copied

bbpbuildbot · 2021-12-13T09:21:51Z

Logfiles from GitLab pipeline #28910 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-13T11:33:53Z

Logfiles from GitLab pipeline #28931 (:white_check_mark:) have been uploaded here!

Status and direct links:

…erGrid & threadsPerBlock (#710)

bbpbuildbot · 2021-12-13T14:03:43Z

Logfiles from GitLab pipeline #28992 (:no_entry:) have been uploaded here!

Status and direct links:

* Use #pragma omp instead of runtime API in `cnrn_target_{copyin,delete}` * Fix `VecPlayContinuous::discon_indices_` device transfer. * Name `cnrn_target_` wrappers more consistently. Co-authored-by: Olli Lupton <[email protected]>

bbpbuildbot · 2021-12-13T15:46:51Z

Logfiles from GitLab pipeline #29035 (:white_check_mark:) have been uploaded here!

Status and direct links:

We prefer selective host-to-device updates.

bbpbuildbot · 2021-12-14T08:25:50Z

Logfiles from GitLab pipeline #29110 (:white_check_mark:) have been uploaded here!

Status and direct links:

Code fixes for XLC and Clang execution without build system changes. This mainly adds missing OpenMP pragmas and makes cnrn_target_ wrappers visible to NMODL.

bbpbuildbot · 2021-12-14T10:21:31Z

Logfiles from GitLab pipeline #29143 (:white_check_mark:) have been uploaded here!

Status and direct links:

omp_get_mapped_ptr was added in OpenMP 5.1 and is not widely supported. With this change then calling cnrn_target_deviceptr on a pointer that is not present on the device is a hard error instead of returning nullptr, so avoid calling it for artificial cells.

bbpbuildbot · 2021-12-16T09:54:24Z

Logfiles from GitLab pipeline #29497 (:white_check_mark:) have been uploaded here!

Status and direct links:

* Set nwarp to very big number for optimal parallelization and improve a bit grid config of CUDA solve_interleaved2

bbpbuildbot · 2021-12-17T11:36:02Z

Logfiles from GitLab pipeline #29728 (:white_check_mark:) have been uploaded here!

Status and direct links:

* Re-enable GitLab CI. * Add NMODL + OpenACC test. * Restore {clang,cmake}-format checks. * Prefer OpenACC with MOD2C. * Do not enable OpenACC in NMODL + OpenMP mode. * Convert more #pragma acc to nrn_pragma_acc(...). * Call cudaSetDevice in OpenMP mode. Co-authored-by: Ioannis Magkanaris <[email protected]>

CMake/OpenAccHelper.cmake

Presumably this was working before because our nvhpc localrc files accidentally included CUDA include directories before BlueBrain/spack#1392.

bbpbuildbot · 2021-12-21T19:19:25Z

bbpbuildbot · 2021-12-21T21:07:23Z

* Compile NVHPC+Open{ACC,MP} with -cuda. * Pull in NMODL+Eigen fixes to make this work.

iomaganaris

LGTM 👍
Just one question

coreneuron/network/partrans.cpp

bbpbuildbot · 2021-12-22T11:55:06Z

pramodk

LGTM!

Overall all changes look fine to me. Added one/two comments for clarification.

coreneuron/CMakeLists.txt

coreneuron/permute/cellorder.cu

coreneuron/sim/solve_core.cpp

coreneuron/network/partrans.cpp

bbpbuildbot · 2021-12-22T17:20:06Z

bbpbuildbot · 2021-12-23T09:27:51Z

Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (BlueBrain/CoreNeuron#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (BlueBrain/CoreNeuron#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c

olupton and others added 11 commits November 23, 2021 09:11

Update nmodl to hackathon_main.

d452e1a

[Hackathon] disable a lot of CI (#694)

8ab49e9

* Disable cmake-format and clang-format checks. * Disable GitLab CI except for NMODL + GPU.

[Hackathon] Add a temporary option for benchmark data. (#695)

560cc3f

* Add a hackathon-specific argument for benchmarks. * Add a reference comparison for channel-benchmark.

Minor changes for building on perlmutter (#697)

de4e433

* create build/benchmark folder before trying to use it * run nrnivmodl-core in parallel than serially (too slow)

Enable OpenMP in CoreNEURON CI. (#698)

81dd5ef

Set by default the number of warps to execute in a large reasonable n…

3e394c4

…umber and update the related documentation (#700)

Add memory pool for Random123 streams. (#702)

a8bb716

* Add memory pool for Random123 streams. This speeds up initialisation when running on GPU. * Make Boost optional.

Fix Boost-free compilation. (#703)

9649814

This was a silly bug in #702.

small openacc fixes (#707)

57f7724

olupton mentioned this pull request Dec 13, 2021

Integrate changes from NERSC GPU hackathon. BlueBrain/nmodl#783

Merged

Fixup to make the CI work better while finalising hackathon changes.

56889cc

solve_interleaved2_launcher (CUDA interface) : fixing size of blocksP…

01a39d7

…erGrid & threadsPerBlock (#710)

Remove unused GPU code (#711)

78081b4

We prefer selective host-to-device updates.

Fixes and improvements from LLVM/XLC work. (#716)

781d34f

Code fixes for XLC and Clang execution without build system changes. This mainly adds missing OpenMP pragmas and makes cnrn_target_ wrappers visible to NMODL.

GPU implementation improvements (#718)

d03c45f

* Set nwarp to very big number for optimal parallelization and improve a bit grid config of CUDA solve_interleaved2

olupton mentioned this pull request Dec 17, 2021

Re-enable CI tests on hackathon branch #717

Merged

1 task

olupton and others added 2 commits December 17, 2021 14:53

NMODL -> hackathon_main.

9a98f73

olupton commented Dec 21, 2021

View reviewed changes

CMake/OpenAccHelper.cmake Outdated Show resolved Hide resolved

olupton added 2 commits December 21, 2021 19:48

Add CUDA toolkit includes.

531c4fe

Presumably this was working before because our nvhpc localrc files accidentally included CUDA include directories before BlueBrain/spack#1392.

Fixup cmake-format.

e3aeafc

olupton added 2 commits December 22, 2021 10:23

Compile with -cuda. (#721)

9fddc7d

* Compile NVHPC+Open{ACC,MP} with -cuda. * Pull in NMODL+Eigen fixes to make this work.

Cleanup CMake for GPU offload.

1fbba17

olupton marked this pull request as ready for review December 22, 2021 09:39

olupton added 2 commits December 22, 2021 11:36

fixup

847d415

fixup the fixup 🤦

53b0c5f

iomaganaris approved these changes Dec 22, 2021

View reviewed changes

coreneuron/network/partrans.cpp Show resolved Hide resolved

olupton requested a review from pramodk December 22, 2021 12:03

pramodk approved these changes Dec 22, 2021

View reviewed changes

olupton mentioned this pull request Dec 22, 2021

Investigate lighterweight alternatives to Boost.Pool #728

Open

olupton added 2 commits December 22, 2021 18:00

NMODL -> master after #783.

2c7377c

Drop two OpenMP taskwait directives.

5c5b8a3

olupton closed this Dec 22, 2021

olupton reopened this Dec 22, 2021

olupton closed this Dec 22, 2021

olupton reopened this Dec 22, 2021

olupton closed this Dec 23, 2021

olupton reopened this Dec 23, 2021

This was referenced Dec 23, 2021

Address issues ignored by the reduced hackathon_main CI neuronsimulator/gpuhackathon#4

Closed

Improve OpenMP offload implementation #729

Open

olupton merged commit 423ae6c into master Dec 23, 2021

olupton mentioned this pull request Feb 24, 2022

Improvements for GPU binary wheels #672

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate changes from NERSC GPU hackathon. #713

Integrate changes from NERSC GPU hackathon. #713

olupton commented Dec 13, 2021 •

edited

Loading

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 14, 2021

bbpbuildbot commented Dec 14, 2021

bbpbuildbot commented Dec 16, 2021

bbpbuildbot commented Dec 17, 2021

bbpbuildbot commented Dec 21, 2021

bbpbuildbot commented Dec 21, 2021

iomaganaris left a comment

bbpbuildbot commented Dec 22, 2021

pramodk left a comment

bbpbuildbot commented Dec 22, 2021

bbpbuildbot commented Dec 23, 2021

Integrate changes from NERSC GPU hackathon. #713

Integrate changes from NERSC GPU hackathon. #713

Conversation

olupton commented Dec 13, 2021 • edited Loading

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 13, 2021

bbpbuildbot commented Dec 14, 2021

bbpbuildbot commented Dec 14, 2021

bbpbuildbot commented Dec 16, 2021

bbpbuildbot commented Dec 17, 2021

bbpbuildbot commented Dec 21, 2021

bbpbuildbot commented Dec 21, 2021

iomaganaris left a comment

Choose a reason for hiding this comment

bbpbuildbot commented Dec 22, 2021

pramodk left a comment

Choose a reason for hiding this comment

bbpbuildbot commented Dec 22, 2021

bbpbuildbot commented Dec 23, 2021

olupton commented Dec 13, 2021 •

edited

Loading