Skip to content

Commit

Permalink
Merge branch 'develop' into forrestglines/cuda-with-nvc++-fix
Browse files Browse the repository at this point in the history
  • Loading branch information
forrestglines committed Jun 28, 2024
2 parents c48f557 + f891c02 commit 9d6fdc6
Show file tree
Hide file tree
Showing 97 changed files with 2,797 additions and 1,389 deletions.
18 changes: 18 additions & 0 deletions .github/workflows/ci-extended.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,20 @@ jobs:
path: tst/regression/gold_standard/
key: gold-standard

- name: Set vars based on matrix
id: cmake-vars
run: |
if ${{ matrix.device == 'host' }}; then
echo "enable_asan=ON" >> $GITHUB_OUTPUT
else
echo "enable_asan=OFF" >> $GITHUB_OUTPUT
fi
- name: Configure
run: |
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DENABLE_ASAN=${{ steps.cmake-vars.outputs.enable_asan }} \
-DMACHINE_VARIANT=${{ matrix.device }}-${{ matrix.parallel }}
- name: Build
Expand All @@ -60,6 +70,10 @@ jobs:
cd build
# Pick GPU with most available memory
export CUDA_VISIBLE_DEVICES=$(nvidia-smi --query-gpu=memory.free,index --format=csv,nounits,noheader | sort -nr | head -1 | awk '{ print $NF }')
# Sanitizers options (leak detection is disabled)
export ASAN_OPTIONS=abort_on_error=1:fast_unwind_on_malloc=1
export UBSAN_OPTIONS=print_stacktrace=0
export LSAN_OPTIONS=detect_leaks=0
ctest -L performance -LE perf-reg
# run regression tests
Expand All @@ -68,6 +82,10 @@ jobs:
cd build
# Pick GPU with most available memory
export CUDA_VISIBLE_DEVICES=$(nvidia-smi --query-gpu=memory.free,index --format=csv,nounits,noheader | sort -nr | head -1 | awk '{ print $NF }')
# Sanitizers options (disable leak detection for MPI runs, due to OpenMPI leaks)
export ASAN_OPTIONS=abort_on_error=1:fast_unwind_on_malloc=1
export UBSAN_OPTIONS=print_stacktrace=0
export LSAN_OPTIONS=detect_leaks=0
ctest -L regression -L ${{ matrix.parallel }} -LE perf-reg --timeout 3600
# Test Ascent integration (only most complex setup with MPI and on device)
Expand Down
25 changes: 24 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR 1119]](https://github.com/parthenon-hpc-lab/parthenon/pull/1119) Formalize MeshData partitioning.
- [[PR 1128]](https://github.com/parthenon-hpc-lab/parthenon/pull/1128) Add cycle and nbtotal to hst
- [[PR 1099]](https://github.com/parthenon-hpc-lab/parthenon/pull/1099) Functionality for outputting task graphs in GraphViz format.
- [[PR 1091]](https://github.com/parthenon-hpc-lab/parthenon/pull/1091) Add vector wave equation example.
- [[PR 991]](https://github.com/parthenon-hpc-lab/parthenon/pull/991) Add fine fields.
- [[PR 1106]](https://github.com/parthenon-hpc-lab/parthenon/pull/1106) Add CMake options for turning on ASAN and HWASAN
- [[PR 1100]](https://github.com/parthenon-hpc-lab/parthenon/pull/1100) Custom refinement ops propagated to fluxes
- [[PR 1090]](https://github.com/parthenon-hpc-lab/parthenon/pull/1090) SMR with swarms
- [[PR 1079]](https://github.com/parthenon-hpc-lab/parthenon/pull/1079) Address XDMF/Visit Issues
- [[PR 1084]](https://github.com/parthenon-hpc-lab/parthenon/pull/1084) Properly free swarm boundary MPI requests
Expand All @@ -18,11 +25,20 @@
- [[PR 1019]](https://github.com/parthenon-hpc-lab/parthenon/pull/1019) Enable output for non-cell-centered variables

### Changed (changing behavior/API/variables/...)
- [[PR 1105]](https://github.com/parthenon-hpc-lab/parthenon/pull/1105) Refactor parameter input for linear solvers
- [[PR 1078]](https://github.com/parthenon-hpc-lab/parthenon/pull/1078) Add reduction fallback in 1D. Add IndexRange overload for 1D par loops
- [[PR 1024]](https://github.com/parthenon-hpc-lab/parthenon/pull/1024) Add .outN. to history output filenames
- [[PR 1004]](https://github.com/parthenon-hpc-lab/parthenon/pull/1004) Allow parameter modification from an input file for restarts

### Fixed (not changing behavior/API/variables/...)
- [[PR 1131]](https://github.com/parthenon-hpc-lab/parthenon/pull/1131) Make deallocation of fine and sparse fields work
- [[PR 1127]](https://github.com/parthenon-hpc-lab/parthenon/pull/1127) Add WithFluxes to IsRefined check
- [[PR 1111]](https://github.com/parthenon-hpc-lab/parthenon/pull/1111) Fix undefined behavior due to bitshift of negative number in LogicalLocation
- [[PR 1092]](https://github.com/parthenon-hpc-lab/parthenon/pull/1092) Updates to DataCollection and MeshData to remove requirement of predefining MeshBlockData
- [[PR 1113]](https://github.com/parthenon-hpc-lab/parthenon/pull/1113) Prevent division by zero
- [[PR 1112]](https://github.com/parthenon-hpc-lab/parthenon/pull/1112) Remove shared_ptr cycle in forest::Tree
- [[PR 1104]](https://github.com/parthenon-hpc-lab/parthenon/pull/1104) Fix reading restarts due to hidden ghost var
- [[PR 1098]](https://github.com/parthenon-hpc-lab/parthenon/pull/1098) Move to symmetrized logical coordinates and fix SMR bug
- [[PR 1095]](https://github.com/parthenon-hpc-lab/parthenon/pull/1095) Add missing include guards in hdf5 restart
- [[PR 1093]](https://github.com/parthenon-hpc-lab/parthenon/pull/1093) Fix forest size for symmetry dimensions
- [[PR 1089]](https://github.com/parthenon-hpc-lab/parthenon/pull/1089) Fix loading restart files without derefinement counter
Expand All @@ -41,6 +57,11 @@
- [[PR 1031]](https://github.com/parthenon-hpc-lab/parthenon/pull/1031) Fix bug in non-cell centered AMR

### Infrastructure (changes irrelevant to downstream codes)
- [[PR 1114]](https://github.com/parthenon-hpc-lab/parthenon/pull/1114) Enable sanitizers for extended CI host build
- [[PR 1123]](https://github.com/parthenon-hpc-lab/parthenon/pull/1123) Default initialize ProResInfo.dir
- [[PR 1121]](https://github.com/parthenon-hpc-lab/parthenon/pull/1121) Default initialize BndInfo.dir
- [[PR 1116]](https://github.com/parthenon-hpc-lab/parthenon/pull/1116) Fix NumPy 2.0 test script breakage
- [[PR 1055]](https://github.com/parthenon-hpc-lab/parthenon/pull/1055) Refactor mesh constructors
- [[PR 1066]](https://github.com/parthenon-hpc-lab/parthenon/pull/1066) Re-introduce default loop patterns and exec spaces
- [[PR 1064]](https://github.com/parthenon-hpc-lab/parthenon/pull/1064) Forbid erroneous edge case when adding MeshData on a partition
- [[PR 1035]](https://github.com/parthenon-hpc-lab/parthenon/pull/1035) Fix multigrid infrastructure to work with forest
Expand All @@ -50,9 +71,11 @@


### Removed (removing behavior/API/varaibles/...)

- [[PR 1108]](https://github.com/parthenon-hpc-lab/parthenon/pull/1108) Remove NaN payload tags infrastructure

### Incompatibilities (i.e. breaking changes)
- [[PR 1128]](https://github.com/parthenon-hpc-lab/parthenon/pull/1128) Add cycle and nbtotal to hst
- [[PR 1108]](https://github.com/parthenon-hpc-lab/parthenon/pull/1108) Remove NaN payload tags infrastructure
- [[PR 1026]](https://github.com/parthenon-hpc-lab/parthenon/pull/1026) Particle BCs without relocatable device code
- [[PR 1037]](https://github.com/parthenon-hpc-lab/parthenon/pull/1037) Add SwarmPacks
- [[PR 1042]](https://github.com/parthenon-hpc-lab/parthenon/pull/1042) Use Offset class and clean up of NeighborBlock
Expand Down
16 changes: 15 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ option(CHECK_REGISTRY_PRESSURE "Check the registry pressure for Kokkos CUDA kern
option(TEST_INTEL_OPTIMIZATION "Test intel optimization and vectorization" OFF)
option(TEST_ERROR_CHECKING "Enables the error checking unit test. This test will FAIL" OFF)
option(CODE_COVERAGE "Enable code coverage reporting" OFF)
option(ENABLE_ASAN "Turn on ASAN" OFF)
option(ENABLE_HWASAN "Turn on HWASAN (currently ARM-only)" OFF)

include(cmake/Format.cmake)
include(cmake/Lint.cmake)
Expand Down Expand Up @@ -290,7 +292,19 @@ if (Kokkos_ENABLE_CUDA AND "${PARTHENON_ENABLE_GPU_MPI_CHECKS}" )
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/cmake/CTestCustom.cmake.in ${CMAKE_BINARY_DIR}/CTestCustom.cmake @ONLY)
endif()


# option to turn on AddressSanitizer for debugging
if(ENABLE_ASAN)
message(STATUS "Compiling with AddressSanitizer and UndefinedBehaviorSanitizer *enabled*")
add_compile_options(-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow -fno-sanitize=null -fno-sanitize=alignment)
add_link_options(-fsanitize=address -fsanitize=undefined)
endif(ENABLE_ASAN)

# option to turn on HWAddressSanitizer for debugging
if(ENABLE_HWASAN)
message(STATUS "Compiling with HWAddressSanitizer *enabled*")
add_compile_options(-fsanitize=hwaddress)
add_link_options(-fsanitize=hwaddress)
endif(ENABLE_HWASAN)


# Build Tests and download Catch2
Expand Down
6 changes: 3 additions & 3 deletions benchmarks/burgers/burgers_diff.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python
# ========================================================================================
# (C) (or copyright) 2023. Triad National Security, LLC. All rights reserved.
# (C) (or copyright) 2024. Triad National Security, LLC. All rights reserved.
#
# This program was produced under U.S. Government contract 89233218CNA000001 for Los
# Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
Expand All @@ -21,7 +21,7 @@
description="Compute difference between two history solvers parthenon VIBE",
)
parser.add_argument("file1", type=str, help="First file in diff")
parser.add_argument("file2", type=str, help="Second fiel in diff")
parser.add_argument("file2", type=str, help="Second file in diff")
parser.add_argument(
"-t", "--tolerance", type=float, default=1e-8, help="Relative tolerance for diff"
)
Expand Down Expand Up @@ -54,4 +54,4 @@ def compare_files(file1, file2, tolerance, print_results=True):

if __name__ == "__main__":
args = parser.parse_args()
sys.exit(compare_files(args.file1, args.file1, args.tolerance, True))
sys.exit(compare_files(args.file1, args.file2, args.tolerance, True))
5 changes: 1 addition & 4 deletions benchmarks/burgers/burgers_package.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//========================================================================================
// (C) (or copyright) 2020-2023. Triad National Security, LLC. All rights reserved.
// (C) (or copyright) 2020-2024. Triad National Security, LLC. All rights reserved.
//
// This program was produced under U.S. Government contract 89233218CNA000001 for Los
// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
Expand Down Expand Up @@ -132,7 +132,6 @@ std::shared_ptr<StateDescriptor> Initialize(ParameterInput *pin) {
hst_vars.emplace_back(HstSum, ReduceMass, "MS Mass " + std::to_string(i_octant));
i_octant++;
}
hst_vars.emplace_back(HstSum, MeshCountHistory, "Meshblock count");
pkg->AddParam(parthenon::hist_param_key, hst_vars);

pkg->EstimateTimestepMesh = EstimateTimestepMesh;
Expand Down Expand Up @@ -439,6 +438,4 @@ Real MassHistory(MeshData<Real> *md, const Real x1min, const Real x1max, const R
return result;
}

Real MeshCountHistory(MeshData<Real> *md) { return md->NumBlocks(); }

} // namespace burgers_package
3 changes: 1 addition & 2 deletions benchmarks/burgers/burgers_package.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//========================================================================================
// (C) (or copyright) 2020-2023. Triad National Security, LLC. All rights reserved.
// (C) (or copyright) 2020-2024. Triad National Security, LLC. All rights reserved.
//
// This program was produced under U.S. Government contract 89233218CNA000001 for Los
// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
Expand Down Expand Up @@ -27,7 +27,6 @@ Real EstimateTimestepMesh(MeshData<Real> *md);
TaskStatus CalculateFluxes(MeshData<Real> *md);
Real MassHistory(MeshData<Real> *md, const Real x1min, const Real x1max, const Real x2min,
const Real x2max, const Real x3min, const Real x3max);
Real MeshCountHistory(MeshData<Real> *md);

// compute the hll flux for Burgers' equation
KOKKOS_INLINE_FUNCTION
Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx/src/interface/metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ mutually exclusive:
cell-corners. The variable might be volume-averaged, or defined
pointwise.

For any type of topology, the flag ``Metadata::Fine`` can be set which
causes the field to have twice the resolution of normal fields. These
fields should be able to be specified for output, undergo ghost exchange,
etc. but much of this has not been carefully tested. **As a result, ``Fine``
fields should be handled with care in downstream codes and carefully checked
to make sure they are behaving as expected.**

Variable Behaviors
------------------

Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx/src/interface/refinement_operations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,10 @@ You must register both prolongation and restriction together. You may,
however, use the default Parthenon structs if desired. Then any variable
registered with this metadata object will use your custom prolongation
and restriction operations.

When a variable with custom operations is enrolled and marked
``Metadata::WithFluxes``, the resulting flux variables that are created will
also have the same custom operations enrolled. In general the custom operations
will need to be different for the variable and for its fluxes; these can be
distinguished inside the custom operations by referring to the
``TopologicalElement`` template parameter.
1 change: 1 addition & 0 deletions example/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

add_subdirectory(stochastic_subgrid)
add_subdirectory(advection)
add_subdirectory(fine_advection)
add_subdirectory(calculate_pi)
add_subdirectory(kokkos_pi)
add_subdirectory(particles)
Expand Down
34 changes: 14 additions & 20 deletions example/advection/advection_driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,27 +62,21 @@ TaskCollection AdvectionDriver::MakeTaskCollection(BlockList_t &blocks, const in
const Real dt = integrator->dt;
const auto &stage_name = integrator->stage_name;

// first make other useful containers
if (stage == 1) {
for (int i = 0; i < blocks.size(); i++) {
auto &pmb = blocks[i];
// first make other useful containers
auto &base = pmb->meshblock_data.Get();
pmb->meshblock_data.Add("dUdt", base);
for (int s = 1; s < integrator->nstages; s++)
pmb->meshblock_data.Add(stage_name[s], base);
}
}

const int num_partitions = pmesh->DefaultNumPartitions();

auto partitions = pmesh->GetDefaultBlockPartitions();
int num_partitions = partitions.size();
// note that task within this region that contains one tasklist per pack
// could still be executed in parallel
TaskRegion &single_tasklist_per_pack_region2 = tc.AddRegion(num_partitions);
for (int i = 0; i < num_partitions; i++) {
auto &tl = single_tasklist_per_pack_region2[i];
auto &mc0 = pmesh->mesh_data.GetOrAdd(stage_name[stage - 1], i);
auto &mc1 = pmesh->mesh_data.GetOrAdd(stage_name[stage], i);
// Initialize the base MeshData for this partition
// (this automatically initializes the MeshBlockData objects
// required by this MeshData object)
auto &mbase = pmesh->mesh_data.Add("base", partitions[i]);
// Initialize other MeshData objects based on the base container
auto &mc0 = pmesh->mesh_data.Add(stage_name[stage - 1], mbase);
auto &mc1 = pmesh->mesh_data.Add(stage_name[stage], mbase);
auto &mdudt = pmesh->mesh_data.Add("dUdt", mbase);

const auto any = parthenon::BoundaryType::any;

Expand Down Expand Up @@ -119,10 +113,10 @@ TaskCollection AdvectionDriver::MakeTaskCollection(BlockList_t &blocks, const in
TaskRegion &single_tasklist_per_pack_region = tc.AddRegion(num_partitions);
for (int i = 0; i < num_partitions; i++) {
auto &tl = single_tasklist_per_pack_region[i];
auto &mbase = pmesh->mesh_data.GetOrAdd("base", i);
auto &mc0 = pmesh->mesh_data.GetOrAdd(stage_name[stage - 1], i);
auto &mc1 = pmesh->mesh_data.GetOrAdd(stage_name[stage], i);
auto &mdudt = pmesh->mesh_data.GetOrAdd("dUdt", i);
auto &mbase = pmesh->mesh_data.Add("base", partitions[i]);
auto &mc0 = pmesh->mesh_data.Add(stage_name[stage - 1], mbase);
auto &mc1 = pmesh->mesh_data.Add(stage_name[stage], mbase);
auto &mdudt = pmesh->mesh_data.Add("dUdt", mbase);

auto set_flx = parthenon::AddFluxCorrectionTasks(none, tl, mc0, pmesh->multilevel);

Expand Down
5 changes: 3 additions & 2 deletions example/calculate_pi/pi_driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,15 @@ TaskCollection PiDriver::MakeTaskCollection(T &blocks) {
using calculate_pi::ComputeArea;
TaskCollection tc;

const int num_partitions = pmesh->DefaultNumPartitions();
auto partitions = pmesh->GetDefaultBlockPartitions();
const int num_partitions = partitions.size();
ParArrayHost<Real> areas("areas", num_partitions);
TaskRegion &async_region = tc.AddRegion(num_partitions);
{
// asynchronous region where area is computed per partition
for (int i = 0; i < num_partitions; i++) {
TaskID none(0);
auto &md = pmesh->mesh_data.GetOrAdd("base", i);
auto &md = pmesh->mesh_data.Add("base", partitions[i]);
auto get_area = async_region[i].AddTask(none, ComputeArea, md, areas, i);
}
}
Expand Down
28 changes: 28 additions & 0 deletions example/fine_advection/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#=========================================================================================
# (C) (or copyright) 2024. Triad National Security, LLC. All rights reserved.
#
# This program was produced under U.S. Government contract 89233218CNA000001 for Los
# Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
# for the U.S. Department of Energy/National Nuclear Security Administration. All rights
# in the program are reserved by Triad National Security, LLC, and the U.S. Department
# of Energy/National Nuclear Security Administration. The Government is granted for
# itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
# license in this material to reproduce, prepare derivative works, distribute copies to
# the public, perform publicly and display publicly, and to permit others to do so.
#=========================================================================================

get_property(DRIVER_LIST GLOBAL PROPERTY DRIVERS_USED_IN_TESTS)
if( "fine_advection-example" IN_LIST DRIVER_LIST OR NOT PARTHENON_DISABLE_EXAMPLES)
add_executable(
fine_advection-example
advection_driver.cpp
advection_driver.hpp
advection_package.cpp
advection_package.hpp
main.cpp
parthenon_app_inputs.cpp
stokes.hpp
)
target_link_libraries(fine_advection-example PRIVATE Parthenon::parthenon)
lint_target(fine_advection-example)
endif()
5 changes: 5 additions & 0 deletions example/fine_advection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
This example implements upwind advection of a cell-centered scalar variable defined
on the regular grid and for another cell-centered variable on the fine grid (which
is twice the resolution and is selected using Metadata::Fine). The newer type-based
`SparsePack`s are used throughout and machinery for doing a generalized Stoke's
theorem based update is included.
Loading

0 comments on commit 9d6fdc6

Please sign in to comment.