Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

NERSC GPU hackathon (Dec 2021)

Olli Lupton edited this page Dec 2, 2021 · 84 revisions

NERSC GPU hackathon (Dec 2021)

This page summarises preparations for the hackathon on the 2nd/8th/9th/10th December 2021. We will use CoreNEURON+NMODL with a few sets of test model data.

NOTE If you see any issues that we need to be handled after hackathon should be created in https://github.com/neuronsimulator/gpuhackathon/issues

Preparatory Tasks

  • Olli - Install Caliper, Jinja2 PyYAML Pytest Sympy on Ascent (via Spack or Standalone?), Blocked due to access issue

  • Pramod - Update Ascent instructions to enable NMODL (same as NERSC), Blocked due to access issue

  • Olli - Channel benchmark failing on Ascent during load input dataset. Should we re-generate data on Ascent?

  • Ioannis - Generate & copy input dataset for Solver analysis and optimisation during Hackathon. Add info on this wiki page for the same.

  • Omar - List of all OpenACC directives and APIs used in the CoreNEURON + NMODL. See https://github.com/neuronsimulator/gpuhackathon/blob/main/overview.md

  • Olli - Common profiling/benchmarking script which could compare & plot OpenMP vs OpenACC runtimes

  • First Day Presentation Preparation

    • Pramod + Olli: 3 min Introduction
    • Ioannis + Christos: Solver introduction to mentors
    • Pramod + Omar + Olli: Compute loop + DSL code generation introduction to mentors

Hackathon Tasks

NEURON GPU Enabled Builds on systems like Perlmutter and Ascent

  • Alex: Check the neuron-gpu-nightly execution on Perlmutter. Do they work with/without MPI?
  • Alex: NEURON + CoreNEURON - Interviews and NEURON + CoreNEURON + NMODL - Interviews build on Perlmutter & Ascent. Do they work?
    • Alex: Should we automatically disable Interviews on the Cray systems and IBM Power systems by detection via CMake?
  • Alex: Check if there is any improvement needed for building from source on Hackathon systems.

OpenMP migration of CoreNEURON NrnThread / model data transfer

Random123 and OpenMP portability

Hines Solver Analysis and Optimisation

  • Add an option in CoreNEURON that only executes Hines solver ?
  • Find out a way to dynamically set a good number for the nwarp variable (used to distribute the cells)
  • Ioannis + Christos: Profiling of a realistic model and analyse different performance metrics with mentors to understand the limitations.
  • Ioannis + Christos: With current algorithm, investigate possible performance improvement opportunities
  • Ioannis + Christos: Performance comparison of OpenACC vs OpenMP vs CUDA implementation
  • Ioannis + Christos: Would benefit from special memory types usage?
  • Ioannis + Christos: Look into algorithmic improvements to expose more parallelism (if suitable)

NMODL Code Generation & OpenMP Migration

CoreNEURON OpenMP migration

  • Build system changes to enable OpenMP target offload as well as OpenACC
  • Olli: Try simple OpenMP + OpenACC offload test (e.g. with inbuilt ring test)
  • Olli: Update Hines Solver in coreneuron with OpenMP (ongoing)
  • Pramod: Measure performance difference between OpenMP and OpenACC runs
  • Olli: Under coreneuron event communication part, start tackling OpenACC pragmas or API calls that are not converted to OpenMP

Eigen Compatibility Issues

Portable wheels

  • Pramod: Prepare a slide describing the goals - what currently we do and what we would like to do?
  • Pramod: Organise / plan a discussion with a / few nvidia compiler engineers. Central question is - ABI compatibility across different CUDA + OpenMP + OpenACC runtimes/compilers.

Unified memory performance : optional / low-priority

  • First, check/fix https://github.com/BlueBrain/CoreNeuron/issues/594
  • Re-measure the slowdown from using unified memory (ringtest + channel-benchmark)
  • Identify which data structures are causing the slowdown. How to identify this easily with the tools?

Ascent

These are instructions to build + run on Ascent (login1.ascent.olcf.ornl.gov).

module load nvhpc/21.9 python cmake flex
module swap cuda/10.1.243 cuda/10.2.89
module use /autofs/nccsopen-svm1_proj/gen170/neuron/spack_modules/linux-rhel7-power9le
module load caliper ninja
export NVLOCALRC=/ccsopen/proj/gen170/neuron/nersc-gpu-hackathon-dec-2021/localrc
export PATH=/sw/ascent/gcc/6.4.0/bin:$PATH
# clone repository
cd $HOME
git clone --branch hackathon_main [email protected]:BlueBrain/CoreNeuron.git # or git clone --branch hackathon_main https://github.com/BlueBrain/CoreNeuron.git
cd CoreNeuron && mkdir -p build && cd build
cmake .. -G Ninja -DCORENRN_ENABLE_CALIPER_PROFILING=ON -DCORENRN_ENABLE_GPU=ON -DCORENRN_ENABLE_NMODL=ON -DCMAKE_INSTALL_PREFIX=../install  -DCMAKE_CXX_FLAGS="-DR123_USE_SSE=0"  -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_CUDA_COMPILER=nvcc -DCORENRN_EXTERNAL_BENCHMARK_DATA=/ccsopen/proj/gen170/neuron/nersc-gpu-hackathon-dec-2021/
cmake --build . --parallel

Running

As $HOME is not writable, create your own directory into the project directory:

mkdir -p /ccsopen/proj/gen170/neuron/nersc-gpu-hackathon-dec-2021/users/$USER
cd /ccsopen/proj/gen170/neuron/nersc-gpu-hackathon-dec-2021/users/$USER

Now rub a tiny, functional test on the GPU:

  • Allocate a node
bsub -P GEN170 -J neuron -W 90 -nnodes 1 -alloc_flags "gpumps" -Is $SHELL
  • Make sure to load necessary modules:
module load nvhpc/21.9 python cmake
module swap cuda/10.1.243 cuda/10.2.89
module use /autofs/nccsopen-svm1_proj/gen170/neuron/spack_modules/linux-rhel7-power9le
module load caliper
  • Run simple function test on the GPU:
OMP_NUM_THREADS=1 jsrun --gpu_per_rs 1 -n 1 $HOME/CoreNeuron/build/bin/ppc64le/special-core -e 1 -d $HOME/CoreNeuron/tests/integration/ring --gpu --mpi
  • Run a channel-benchmark test
NVCOMPILER_ACC_SYNCHRONOUS=1 OMP_NUM_THREADS=1 \
  jsrun --gpu_per_rs 2 -n 2 $HOME/CoreNeuron/build/benchmark/ppc64le/special-core \
  --datpath=/ccsopen/proj/gen170/neuron/nersc-gpu-hackathon-dec-2021/channel-benchmark-all-1320-cells-2-ranks/ \
  --mpi --gpu --cell-permute=2 --tstop=100

Perlmutter

Building

  • The system-wide modules only go up to NVHPC 21.7, which has known issues with NEURON. We have ourselves inserted an installation of NVHPC 21.9 (module nvidia/21.9) into the Cray Programming Environment setup.
  • Note that NVHPC 21.9 is configured to use the system GCC 7.5 standard library, while we have built some other dependencies with GCC 9.2. This seems to be "close enough".
  • -tp haswell or -DR123_USE_SSE=0 is required because nvc++ defaults to -tp zen on the Perlmutter nodes, which defines __ABM__ and causes Random123 to try and include intrin.h, which fails.
# clone repository
git clone --branch hackathon_main [email protected]:BlueBrain/CoreNeuron.git # or git clone --branch hackathon_main https://github.com/BlueBrain/CoreNeuron.git
cd CoreNeuron && mkdir -p build && cd build

# allocate node
salloc --nodes 1 --qos interactive --time 01:00:00 --constraint gpu --gpus 4 --account=ntrain9_g

# Use our own hand-crafted modules for cuda 11.4 (from nvhpc/21.9) and PrgEnv-nvidia for nvhpc 21.9
module use /global/cfs/cdirs/ntrain9/neuron/modules
# Also, spack-generated modules for dependencies (caliper, ninja, py-*)
module use /global/cfs/cdirs/ntrain9/neuron/spack_modules/cray-sles15-zen2

# Load modules: prefer CUDA 11.4 from NVHPC/21.9
module swap cuda cuda/11.4.1
module load cmake nvidia/21.9 python caliper ninja py-pytest py-pyyaml py-jinja2 py-sympy

# Build CoreNEURON
cmake .. -G Ninja \
  -DCORENRN_ENABLE_CALIPER_PROFILING=ON  \
  -DCORENRN_ENABLE_GPU=ON \
  -DCORENRN_ENABLE_NMODL=ON \
  -DCORENRN_NMODL_FLAGS="sympy --analytic" \
  -DCORENRN_EXTERNAL_BENCHMARK_DATA=$CFS/ntrain9/neuron/nersc-gpu-hackathon-dec-2021 \
  -DCMAKE_INSTALL_PREFIX=../install \
  -DCMAKE_CUDA_COMPILER=nvcc \
  -DCMAKE_CXX_FLAGS="-DR123_USE_SSE=0" \
  -DCMAKE_CXX_COMPILER=CC \
  -DCMAKE_CUDA_ARCHITECTURES=80 \
  -DCORENRN_NMODL_DIR=/global/cfs/cdirs/ntrain9/neuron/spack/cray-sles15-zen2/gcc-9.3.0/nmodl-0.3.0.20111126-m2kos252sgvxkq7xltv5w35e4irae7gj

cmake --build . --parallel
ctest --output-on-failure -j 16

If you are working on the NMODL code generation, avoid -DCORENRN_NMODL_DIR=.. option or provide your own install directory -DCORENRN_NMODL_DIR=...

Running on interactive session

If you haven't allocated a session

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint gpu --gpus 2 --account=ntrain9_g

Note that we are setting NVCOMPILER_ACC_SYNCHRONOUS=1 below so that we get correct timings for individual kernels. Otherwise kernel timings are incorrect due to async launch.

Run simple ringtest, not a compute intensive test, just for functionality check:###

NVCOMPILER_ACC_SYNCHRONOUS=1 OMP_NUM_THREADS=1 CALI_CONFIG=runtime-report,calc.inclusive srun -n 2 benchmark/x86_64/special-core -e 1 --datpath=../tests/integration/ring --mpi --gpu --cell-permute=2

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 84a7f50 (2021-11-25 18:03:04 +0100)

 Additional mechanisms from files
 Ca.mod CaDynamics_DC0.mod CaDynamics_E2.mod Ca_HVA.mod Ca_HVA2.mod Ca_LVAst.mod CoreConfig.mod Ih.mod Im.mod K_Pst.mod K_Tst.mod KdShu2007.mod NaTa_t.mod NaTg.mod NaTs2_t.mod Nap_Et2.mod ProbAMPANMDA_EMS.mod ProbGABAAB_EMS.mod ProfileHelper.mod SK_E2.mod SKv3_1.mod TTXDynamicsSwitch.mod VecStim.mod cacumm.mod cacummb.mod cagk.mod cal2.mod can2.mod cat.mod exp2syn.mod expsyn.mod gap.mod h.mod halfgap.mod hh.mod kadist.mod kaprox.mod kca.mod kd.mod kd2.mod kdb.mod kdrbca1.mod kdrca1.mod km.mod kmb.mod na3n.mod naxn.mod netstim.mod netstim_inhpoisson.mod new_calcium_channels.mod passive.mod pattern.mod stim.mod svclmp.mod

 num_mpi=2
 num_omp_thread=1

 Info : 2 GPUs shared by 2 ranks per node
...

Run a channel-benchmark test

NVCOMPILER_ACC_SYNCHRONOUS=1 OMP_NUM_THREADS=1 CALI_CONFIG=runtime-report,calc.inclusive \
  srun -n 2 benchmark/x86_64/special-core \
  --datpath=$CFS/ntrain9/neuron/nersc-gpu-hackathon-dec-2021/channel-benchmark-all-1320-cells-2-ranks/ \
  --mpi --gpu --cell-permute=2 --tstop=100

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 de4e433 (2021-11-26 08:50:58 +0100)

 Additional mechanisms from files
 Ca.mod CaDynamics_DC0.mod CaDynamics_E2.mod Ca_HVA.mod Ca_HVA2.mod Ca_LVAst.mod CoreConfig.mod Ih.mod Im.mod K_Pst.mod K_Tst.mod KdShu2007.mod NaTa_t.mod NaTg.mod NaTs2_t.mod Nap_Et2.mod ProbAMPANMDA_EMS.mod ProbGABAAB_EMS.mod ProfileHelper.mod SK_E2.mod SKv3_1.mod TTXDynamicsSwitch.mod VecStim.mod cacumm.mod cacummb.mod cagk.mod cal2.mod can2.mod cat.mod exp2syn.mod expsyn.mod gap.mod h.mod halfgap.mod hh.mod kadist.mod kaprox.mod kca.mod kd.mod kd2.mod kdb.mod kdrbca1.mod kdrca1.mod km.mod kmb.mod na3n.mod naxn.mod netstim.mod netstim_inhpoisson.mod new_calcium_channels.mod passive.mod pattern.mod stim.mod svclmp.mod

 num_mpi=2
 num_omp_thread=1

 Info : 2 GPUs shared by 2 ranks per node

....
Solver Time : 16.2698


 Simulation Statistics
 Number of cells: 1320
 Number of compartments: 648360
 Number of presyns: 3961320
 Number of input presyns: 0
 Number of synapses: 3960002
 Number of point processes: 7921322
 Number of transfer sources: 0
 Number of transfer targets: 0
 Number of spikes: 11969
 Number of spikes with non negative gid-s: 11969
Path                                     Min time/rank Max time/rank Avg time/rank Time %
main                                         40.376340     40.378758     40.377549 99.399072
  checkpoint                                  0.000001      0.000002      0.000002  0.000004
  output-spike                                0.004200      0.004211      0.004205  0.010353
  simulation                                 16.269795     16.269813     16.269804 40.052045
    spike-exchange                            0.019603      0.041487      0.030545  0.075194
      spike-exchange                          0.019578      0.041452      0.030515  0.075120
        communication                         0.000886      0.000906      0.000896  0.002206
        imbalance                             0.018654      0.040551      0.029603  0.072874
    timestep                                 16.225238     16.247191     16.236214 39.969356
      state-update                            4.167419      4.176959      4.172189 10.270849
        state-SKv3_1                          0.084673      0.086011      0.085342  0.210090
        state-SK_E2                           0.086975      0.087625      0.087300  0.214910
        state-ProbGABAAB_EMS                  0.176391      0.182205      0.179298  0.441385
        state-ProbAMPANMDA_EMS                0.512746      0.521091      0.516919  1.272520
        state-nax                             0.135361      0.136370      0.135866  0.334466
        state-NaTg                            0.121694      0.121713      0.121704  0.299603
        state-Nap_Et2                         0.091850      0.092020      0.091935  0.226320
        state-na3                             0.126686      0.127354      0.127020  0.312690
        state-K_Tst                           0.083087      0.083141      0.083114  0.204605
        state-K_Pst                           0.086288      0.086488      0.086388  0.212665
        state-kmb                             0.078860      0.079250      0.079055  0.194613
        state-KdShu2007                       0.077877      0.078125      0.078001  0.192018
        state-kdr                             0.094169      0.095289      0.094729  0.233198
        state-kdrb                            0.092489      0.092745      0.092617  0.227999
        state-kdb                             0.089317      0.089467      0.089392  0.220060
        state-kd2                             0.087586      0.092668      0.090127  0.221869
        state-kca                             0.363553      0.364731      0.364142  0.896423
        state-kap                             0.109070      0.109326      0.109198  0.268817
        state-kad                             0.112513      0.113974      0.113244  0.278776
        state-Ih                              0.098592      0.100036      0.099314  0.244485
        state-hd                              0.093146      0.094326      0.093736  0.230754
        state-cat                             0.121305      0.121554      0.121430  0.298928
        state-can                             0.121067      0.122852      0.121960  0.300233
        state-Ca_LVAst                        0.129896      0.130772      0.130334  0.320849
        state-cal                             0.108788      0.109597      0.109192  0.268804
        state-Ca_HVA2                         0.142881      0.143095      0.142988  0.351999
        state-cagk                            0.097391      0.098042      0.097716  0.240553
        state-cacum                           0.234768      0.235131      0.234950  0.578385
        state-cacumb                          0.175342      0.176663      0.176002  0.433273
        state-IClamp                          0.002348      0.002558      0.002453  0.006039
        state-CaDynamics_DC0                  0.113226      0.113724      0.113475  0.279346
        state-pas                             0.002896      0.003032      0.002964  0.007297
      update                                  0.116486      0.117232      0.116859  0.287677
      second-order-cur                        0.001943      0.002165      0.002054  0.005056
      matrix-solver                           6.071176      6.123533      6.097355 15.010108
      setup-tree-matrix                       4.745456      4.748895      4.747175 11.686317
        cur-SKv3_1                            0.104343      0.105509      0.104926  0.258301
        cur-SK_E2                             0.097890      0.098530      0.098210  0.241768
        cur-ProbGABAAB_EMS                    0.252629      0.255678      0.254154  0.625660
        cur-ProbAMPANMDA_EMS                  0.903469      0.903876      0.903672  2.224608
        cur-nax                               0.113687      0.115076      0.114382  0.281578
        cur-NaTg                              0.113117      0.113994      0.113556  0.279544
        cur-Nap_Et2                           0.084684      0.084767      0.084725  0.208572
        cur-na3                               0.095460      0.096301      0.095880  0.236033
        cur-K_Tst                             0.076803      0.076846      0.076825  0.189122
        cur-K_Pst                             0.083091      0.083296      0.083193  0.204801
        cur-kmb                               0.084998      0.085193      0.085095  0.209483
        cur-KdShu2007                         0.079360      0.079602      0.079481  0.195662
        cur-kdr                               0.084818      0.085221      0.085019  0.209296
        cur-kdrb                              0.083571      0.083837      0.083704  0.206058
        cur-kdb                               0.074649      0.075234      0.074942  0.184487
        cur-kd2                               0.074704      0.075905      0.075305  0.185380
        cur-kca                               0.102095      0.102144      0.102119  0.251392
        cur-kap                               0.093734      0.094259      0.093996  0.231395
        cur-kad                               0.096892      0.096943      0.096917  0.238586
        cur-Ih                                0.099069      0.099392      0.099230  0.244280
        cur-hd                                0.079998      0.080359      0.080179  0.197379
        cur-cat                               0.103624      0.103911      0.103767  0.255449
        cur-can                               0.118858      0.119161      0.119010  0.292971
        cur-Ca_LVAst                          0.123353      0.126097      0.124725  0.307041
        cur-cal                               0.122882      0.123123      0.123003  0.302800
        cur-Ca_HVA2                           0.142775      0.143094      0.142934  0.351868
        cur-cagk                              0.122285      0.123171      0.122728  0.302125
        cur-cacum                             0.081801      0.082285      0.082043  0.201969
        cur-cacumb                            0.074288      0.075040      0.074664  0.183803
        cur-IClamp                            0.085479      0.088503      0.086991  0.214149
        cur-CaDynamics_DC0                    0.064449      0.064505      0.064477  0.158726
        cur-ttx_ion                           0.063089      0.063417      0.063253  0.155713
        cur-ca_ion                            0.100195      0.100529      0.100362  0.247065
        cur-k_ion                             0.063816      0.064154      0.063985  0.157515
        cur-na_ion                            0.066107      0.066272      0.066190  0.162941
        cur-pas                               0.109251      0.109293      0.109272  0.268999
      deliver-events                          0.982300      1.005465      0.993883  2.446681
        net-receive-ProbGABAAB_EMS            0.000950      0.001073      0.001012  0.002490
        net-receive-ProbAMPANMDA_EMS          0.003695      0.003919      0.003807  0.009372
        net-buf-receive-ExpSyn                0.003570      0.003671      0.003621  0.008913
        net-buf-receive-Exp2Syn               0.003701      0.003835      0.003768  0.009276
        net-buf-receive-ProbGABAAB_EMS        0.068577      0.070341      0.069459  0.170990
        net-buf-receive-ProbAMPANMDA_EMS      0.090366      0.094045      0.092206  0.226986
        update-net-receive-buf                0.399891      0.406366      0.403129  0.992398
          net-receive-buf-cpu2gpu             0.379896      0.385496      0.382696  0.942098
          net-receive-buf-order               0.003375      0.003533      0.003454  0.008503
        check-threshold                       0.182114      0.184026      0.183070  0.450671
  finitialize                                 2.094235      2.094272      2.094254  5.155510
    spike-exchange                            0.000035      0.105358      0.052696  0.129725
      spike-exchange                          0.000031      0.105354      0.052693  0.129715
        communication                         0.000019      0.000022      0.000020  0.000050
        imbalance                             0.000006      0.105331      0.052669  0.129656
    cur-SKv3_1                                0.000031      0.000032      0.000031  0.000078
    cur-SK_E2                                 0.000029      0.000031      0.000030  0.000074
    cur-ProbGABAAB_EMS                        0.000072      0.000073      0.000073  0.000178
    cur-ProbAMPANMDA_EMS                      0.000265      0.000267      0.000266  0.000655
    cur-nax                                   0.000034      0.000034      0.000034  0.000084
    cur-NaTg                                  0.000032      0.000034      0.000033  0.000081
    cur-Nap_Et2                               0.000025      0.000026      0.000025  0.000063
    cur-na3                                   0.000029      0.000030      0.000029  0.000073
    cur-K_Tst                                 0.000023      0.000024      0.000024  0.000058
    cur-K_Pst                                 0.000025      0.000026      0.000025  0.000063
    cur-kmb                                   0.000026      0.000027      0.000027  0.000065
    cur-KdShu2007                             0.000024      0.000024      0.000024  0.000059
    cur-kdr                                   0.000026      0.000026      0.000026  0.000064
    cur-kdrb                                  0.000025      0.000025      0.000025  0.000062
    cur-kdb                                   0.000023      0.000023      0.000023  0.000057
    cur-kd2                                   0.000024      0.000024      0.000024  0.000059
    cur-kca                                   0.000031      0.000032      0.000031  0.000078
    cur-kap                                   0.000027      0.000028      0.000027  0.000068
    cur-kad                                   0.000028      0.000030      0.000029  0.000071
    cur-Ih                                    0.000029      0.000030      0.000029  0.000073
    cur-hd                                    0.000024      0.000025      0.000024  0.000060
    cur-cat                                   0.000030      0.000032      0.000031  0.000076
    cur-can                                   0.000035      0.000035      0.000035  0.000086
    cur-Ca_LVAst                              0.000035      0.000036      0.000035  0.000087
    cur-cal                                   0.000036      0.000037      0.000036  0.000090
    cur-Ca_HVA2                               0.000039      0.000041      0.000040  0.000098
    cur-cagk                                  0.000037      0.000038      0.000037  0.000092
    cur-cacum                                 0.000025      0.000026      0.000025  0.000063
    cur-cacumb                                0.000024      0.000025      0.000024  0.000060
    cur-IClamp                                0.000027      0.000028      0.000027  0.000068
    cur-CaDynamics_DC0                        0.000021      0.000022      0.000022  0.000053
    cur-ttx_ion                               0.000026      0.000026      0.000026  0.000064
    cur-ca_ion                                0.000028      0.000029      0.000029  0.000070
    cur-k_ion                                 0.000018      0.000019      0.000018  0.000046
    cur-na_ion                                0.000022      0.000023      0.000022  0.000055
    cur-pas                                   0.000034      0.000034      0.000034  0.000084
    update-net-receive-buf                    0.000021      0.000023      0.000022  0.000054
  load-model                                 21.857876     21.860913     21.859395 53.812170

Produce an NSight Systems profile

To produce a useful profile with NSight Systems we need to configure Caliper to emit NVTX markers (CALI_CONFIG=nvtx) and tell NSight Systems to record regions with names that are not registered strings (NSYS_NVTX_PROFILER_REGISTER_ONLY=0). To avoid profiling model initialisation and setup, you may want to only record the actual simulation (--capture-range=nvtx --nvtx-capture=simulation). Additionally, NSight Systems seems to have trouble profiling multiple OpenMP host threads launching GPU kernels at once, so you may want to disable that (OMP_NUM_THREADS=1). Taken together, an example prefix could be:

CALI_CONFIG=nvtx OMP_NUM_THREADS=1 nsys profile --env-var NSYS_NVTX_PROFILER_REGISTER_ONLY=0 --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --trace=cuda,nvtx,openacc,openmp --capture-range=nvtx --nvtx-capture=simulation ./x86_64/special-core ...

Some other notes:

  • during development, just use simple ring test (for quick iteration)
  • install nmodl master into project space $CFS/ntrain9/neuron so that mentors or other people can just use standard version for profiling or other non-codegen related tasks.

Additional useful repositories

Clone this wiki locally