Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify par_dispatch, par_for_outer & par_for_inner overloads #1142

Open
wants to merge 113 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
6995d11
wrap 3D flat loop abstractions
acreyes Jun 22, 2024
2e61847
add 4D loop and test
acreyes Jun 22, 2024
054ca0e
add specialization for const int &
Jun 22, 2024
4c83c4c
added mdrange loops to par_reduce tests
Jun 22, 2024
0730429
refactor flatloop specialization
acreyes Jun 22, 2024
adb15dd
clean up
acreyes Jun 23, 2024
5db0d34
formatting
acreyes Jun 23, 2024
ba335d7
linting
acreyes Jun 23, 2024
2fa8ad1
templating functor index types
acreyes Jun 23, 2024
b9a95a1
moved to a single functor
acreyes Jun 24, 2024
9033aa3
Update CHANGELOG.md
acreyes Jun 24, 2024
2d450c8
Merge branch 'develop' into acreyes/par_reduce-flatloops
acreyes Jun 25, 2024
503ae0c
first pass, doesn't like 4D loops
acreyes Jun 25, 2024
e01dabf
formatting
acreyes Jun 25, 2024
259115d
added overload for index ranges
acreyes Jun 25, 2024
ad53f40
generic par_dispatch for all flatrange loops
acreyes Jun 26, 2024
442c04f
wrapped 2D MDRange loop
acreyes Jun 26, 2024
a865f00
wrapped rest of MDRange loops
acreyes Jun 27, 2024
dd46b7e
enabled all simd loops
acreyes Jun 27, 2024
7c7ecc0
cleaning up some warnings
Jun 27, 2024
53f0f85
Merge remote-tracking branch 'upstream/develop' into acreyes/par-disp…
acreyes Jul 27, 2024
e9b440d
cleaning up templates & traits
acreyes Jun 27, 2024
2452b48
adding loop collapse patterns
acreyes Jun 29, 2024
188d413
wrapped team policy loops
acreyes Jul 1, 2024
5bb7764
separate inner loop collapses
acreyes Jul 1, 2024
1102470
Wrapping inner par_for loops
acreyes Jul 1, 2024
049bf52
simdfor inner loops
acreyes Jul 2, 2024
9a39c02
formatting
acreyes Jul 27, 2024
08e788f
cleaning up
acreyes Jul 27, 2024
8d1a5ca
infer loop rank from launch bounds rather than functor signature
acreyes Jul 27, 2024
3a32e84
helper type DispatchType to hold useuful parameters
acreyes Jul 27, 2024
c2ac94f
done with par_for_outer
acreyes Jul 27, 2024
d7477c7
cleanup
acreyes Jul 27, 2024
69cda38
adding tests
acreyes Jul 28, 2024
12297d2
completeing tests for par_for & par_reduce. Testing up to 7D loops
acreyes Jul 28, 2024
0905832
fixing sequence_of_ones
acreyes Jul 28, 2024
afc86c1
static_assert for par_scan
acreyes Jul 28, 2024
ceaac8f
modifying test to work on cuda machine
acreyes Jul 29, 2024
de6df61
workaround non-type template parameter usage
acreyes Jul 30, 2024
5954079
Merge branch 'parthenon-hpc-lab:develop' into acreyes/par-dispatch-te…
acreyes Aug 8, 2024
1d7719c
moved TypeList to those in type_list.hpp
acreyes Aug 8, 2024
7b25d89
use correct ThreadVectorRange policy
acreyes Aug 15, 2024
8ab0985
Merge branch 'parthenon-hpc-lab:develop' into acreyes/par-dispatch-te…
acreyes Aug 15, 2024
c16fc6e
cleanup
acreyes Aug 15, 2024
a25ffef
template execution space
acreyes Aug 15, 2024
7d49d4e
linting
acreyes Aug 15, 2024
72aa437
put simd pragma in correct place
acreyes Aug 15, 2024
ed0be07
ContinuousSubListImpl made public
acreyes Aug 15, 2024
fb5ccb2
Borrowing features from upstream/lroberts36/generalize-par-dispatch
acreyes Aug 21, 2024
aae696c
rebuilding dispatch signature
acreyes Aug 21, 2024
8fb8f5c
simdfor using indexer
acreyes Aug 21, 2024
066b3d7
generalized tag disptach
acreyes Aug 21, 2024
54a3c01
dispatch_collapse added for team patterns
acreyes Aug 21, 2024
b2a6e49
par_for_inner with new dispatch
acreyes Aug 21, 2024
5e3e8b2
mdrange dispatch
acreyes Aug 21, 2024
d626fed
array indexer indices
acreyes Aug 22, 2024
565717f
fix team policy scratch size
acreyes Aug 22, 2024
c53ec4f
base_type in functor check
acreyes Aug 22, 2024
14c098a
simdfor inner loop bounds fix
acreyes Aug 22, 2024
f6d9c21
get things working on cuda
acreyes Aug 23, 2024
be6ed04
extra args for hierarchial loops
acreyes Aug 24, 2024
b091860
extra args in team/collapse loops
acreyes Aug 24, 2024
b532cc7
kokkos array in indexer
Aug 24, 2024
8a74cba
cleanup
Aug 24, 2024
64cf179
fix unit test
acreyes Aug 24, 2024
001108f
static_asserts for simdfor loop patterns + seq_for
acreyes Aug 26, 2024
4662b1b
Merge branch 'acreyes/par-dispatch-template-temp' into acreyes/par-di…
acreyes Aug 26, 2024
520811a
Merge branch 'parthenon-hpc-lab:develop' into acreyes/par-dispatch-te…
acreyes Aug 26, 2024
bd9df9c
linting
acreyes Aug 27, 2024
ebae3cd
cleaning up & moving loop bounds to their own header
acreyes Aug 27, 2024
48664ac
fix unit tests for cuda 11.4
acreyes Aug 28, 2024
4887cfc
fixing missing return warnings from if constexpr blocks
acreyes Aug 28, 2024
afc381c
Merge branch 'develop' into acreyes/par-dispatch-template
Yurlungur Sep 5, 2024
842b94d
fix simdfor pattern check
acreyes Sep 5, 2024
beb6847
have Indexer only return Kokkos::Array
acreyes Sep 5, 2024
f68ec09
move kokkos typedefs to their own header
acreyes Sep 5, 2024
9aa1560
collapse pattern infers Nteam from Rank
acreyes Sep 5, 2024
f2cfd91
clean up team/collapse patterns
acreyes Sep 5, 2024
f255f71
making simdfor pattern uniform with others
acreyes Sep 6, 2024
c6f9de1
adding comments and cleaning up LoopPattern* type name use
acreyes Sep 6, 2024
f1403b1
SequentialFor added
acreyes Sep 6, 2024
b31b1dc
use Indexer in unit tests
acreyes Sep 6, 2024
4fbef8f
adding test coverage for outer loop patterns
acreyes Sep 6, 2024
0ee6005
fix par_for_outer tests on gpu
acreyes Sep 6, 2024
6e37753
cleanup scratch memory for par_for_outer test
acreyes Sep 6, 2024
1976164
adding example par_for* test cases
acreyes Sep 6, 2024
7c20e6d
LoopPatternCollapse -> LoopPatternTeamThreadVec
acreyes Sep 7, 2024
8758aef
cleaning up LoopBounds Translator
acreyes Sep 7, 2024
4fa3932
workaround for rtd theme table box line breaks
acreyes Sep 7, 2024
ce39ccc
par_for docs
acreyes Sep 7, 2024
a450c7c
count loop bounds more sensibly
acreyes Sep 7, 2024
85568d6
Revert "workaround for rtd theme table box line breaks"
acreyes Sep 8, 2024
052e39f
ParArray<ND, T> moved to kokkos_types.hpp
acreyes Sep 9, 2024
ae7a3b9
move is_functor -> concepts_lite.hpp
acreyes Sep 9, 2024
562b396
fix a typo
acreyes Sep 9, 2024
034efed
update changelog
acreyes Sep 9, 2024
ccf3e2d
fix scratch pad initialization for gcc 9.4
acreyes Sep 10, 2024
f4b2141
Merge remote-tracking branch 'upstream/develop' into acreyes/par-disp…
acreyes Sep 13, 2024
4cb73d1
Merge branch 'develop' into acreyes/par-dispatch-template
acreyes Sep 24, 2024
fbd2674
Merge remote-tracking branch 'upstream/develop' into acreyes/par-disp…
acreyes Oct 31, 2024
b175f41
Merge branch 'develop' into acreyes/par-dispatch-template
acreyes Nov 1, 2024
11616d8
fix merged test for device
acreyes Nov 13, 2024
1661218
Merge remote-tracking branch 'upstream/develop' into acreyes/par-disp…
acreyes Nov 13, 2024
e6c7b93
Merge branch 'develop' into acreyes/par-dispatch-template
pgrete Dec 2, 2024
28c3bee
fix for HtoD copies in par_for_outer
acreyes Dec 6, 2024
3dfc5cf
add mdrange tiling
acreyes Dec 6, 2024
2de2eab
docs for ParArray<ND, T>
acreyes Dec 6, 2024
206cd31
Update src/kokkos_abstraction.hpp
acreyes Dec 6, 2024
7bc731e
Update doc/sphinx/src/par_for.rst
acreyes Dec 6, 2024
c448372
Update doc/sphinx/src/par_for.rst
acreyes Dec 6, 2024
6152149
Merge branch 'develop' into acreyes/par-dispatch-template
Yurlungur Dec 7, 2024
10f3bc2
check par_reduce for hierarchical patterns
acreyes Dec 7, 2024
279b126
fill tiling array
acreyes Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR 1142]](https://github.com/parthenon-hpc-lab/parthenon/pull/1142) Unify par_dispatch, par_for_outer & par_for_inner overloads
- [[PR 1210]](https://github.com/parthenon-hpc-lab/parthenon/pull/1210) Add cycle based output
- [[PR 1103]](https://github.com/parthenon-hpc-lab/parthenon/pull/1103) Add sparsity to vector wave equation test
- [[PR 1185]](https://github.com/parthenon-hpc-lab/parthenon/pull/1185) Bugfix to particle defragmentation
Expand Down
132 changes: 132 additions & 0 deletions doc/sphinx/src/par_for.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
.. par_for:

Parallelism
===========

The loop wrappers documented here abstract the ``Kokkos::parallel_*`` parallel launches. The wrappers
simplify the use of Kokkos `execution policies <https://kokkos.org/kokkos-core-wiki/API/core/Execution-Policies.html>`_
for multidimensional loops through a common interface using loop pattern tags.

Additionally there is a provided ``parthenon::seq_for`` wrapper that uses a similar interface to perform
multidimensional sequential loops.

An example of usage can be found in `the unit
test <https://github.com/parthenon-hpc-lab/parthenon/blob/develop/tst/unit/kokkos_abstraction.cpp>`__

.. list-table:: parallel launches
:widths: 25 25
:header-rows: 1

* - Parthenon
- Kokkos
* - ``par_for``
- ``parallel_for``
* - ``par_reduce``
- ``parallel_reduce``
* - ``par_scan``
- ``parallel_scan``

Parallel launches are passed a string label, a set of inclusive loop bounds, a functor, and any extra arguments needed
for parallel reductions/scans. Optionally a loop pattern tag and an execution space may be provided.
When ommitted the ``DEFAULT_LOOP_PATTERN`` is used.

.. code:: cpp

parthenon::par_for(
loop_pattern_tag, exec_space, PARTHENON_AUTO_LABEL, ks, ke, js, je, is, ie,
KOKKOS_LAMBDA(const int k, const int j, const int i) {
data(k, j, i) += 1.;
});

.. list-table:: parallel launch parameters
:widths: 25 75
:header-rows: 1

* - Parameter
-
* - loop_pattern_tag
- Determines the execution policy. See table below.
* - exec_space
- kokkos execution space
* - loop bounds
- inclusive start/end pairs for the multidimensional loop. Supported types are ``integral`` and ``parthenon::IndexRange``.
Can be extended to accept other types (see below).
* - functor
- Defines the body of the parallel loop.
See `Kokkos programming guide <https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/ParallelDispatch.html#functors>`_
for more.

.. list-table:: Loop Pattern tags
:widths: 40 60
:header-rows: 1

* - Tag
- Execution Policy
* - ``loop_pattern_flatrange_tag``
- Flattens all of the loops into a single ``Kokkos::RangePolicy``
* - ``loop_pattern_simdfor_tag``
- Maps to two C-style loops. The innermost gets decorated with a ``#pragma omp simd`` and the remaining
loops are flattened into a single C-style for loop. Only supported on CPU.
* - ``loop_pattern_mdrange_tag``
- Maps all the loop bounds onto a ``Kokkos::MDRangePolicy``
* - ``LoopPatternTeamThreadVec<Nt, Nv>()``
- Maps onto a hierarchical parallel loop. The ``Nv`` inner loops are flattened onto a ``VectorRange`` policy,
the next ``Nt`` onto a ``ThreadRange`` policy, and the remaining loops are
flattened into an outer ``TeamThreadRange``. The specializations ``loop_pattern_[tpttr|tptvr|tpttrtvr]_tag`` correspond
to ``<1,0>``, ``<0,1>``, ``<1,1>`` respectively.


Cmake Options
-------------

``PAR_LOOP_LAYOUT`` controls the ``DEFAULT_LOOP_PATTERN`` macro.

.. list-table:: ``PAR_LOOP_LAYOUT`` options.
:widths: 25 25
:header-rows: 1

* - ``PAR_LOOP_LAYOUT``
- Pattern Tag
* - "MANUAL1D_LOOP"
- loop_pattern_flatrange_tag
* - "SIMDFOR_LOOP"
- loop_pattern_simdfor_tag
* - "MDRANGE_LOOP"
- loop_pattern_mdrange_tag
* - "TP_TTR_LOOP"
- loop_pattern_tpttr_tag
* - "TP_TVR_LOOP"
- loop_pattern_tptvr_tag
* - "TPTTRTVR_LOOP"
- loop_pattern_tpttrtvr_tag

Adding New Loop Patterns
------------------------

All of the ``par_for*`` overloads get processed into the ``par_dispatch_impl`` struct that
determines the types of the loop pattern, functor, functor arguments, loop bounds, and any
extra arguments need for scans/reductions. The struct implements overloads of the
``par_dispatch_impl::dispatch_impl`` method that are tagged using the ``PatternTag`` ``enum``
to specialize the ``LoopPatternTag`` struct. New loop patterns need to extend this enum and
provide an additional overload.

There is a chance that the requested loop pattern passed through ``parthenon::par_for``, for
example a ``loop_pattern_simdfor_tag`` ``DEFAULT_LOOP_PATTERN`` being used in a ``par_reduce``,
resulting in a conflict. For this reason the ``DispatchType`` type trait provides the
``DispatchType::GetPatternTag()`` method that processes the requested loop pattern and returns
a ``PatternTag`` and provides sensible fallbacks for the loop pattern if there are any conflicts.
In this way ``DEFAULT_LOOP_PATTERN`` can be reliably used.

Adding New Loop Bound Types
---------------------------

All of the loop bounds provided to any parallel wrapper gets processed by the ``LoopBoundTranslator``
to determine the rank of the multidimensional loop and translate the start/end pairs into an array
of ``IndexRange`` s. Each bound type gets processed individually and allows the flexibility to mix
loop bound types as long as they are supported.

New types can be provided by specializing the ``ProcessLoopBound`` struct in the ``parthenon`` namespace.
These structs need to provide a ``GetNumBounds`` method to count the number of start/end bounds contained
in the type, as well as a ``GetIndexRanges`` method to fill the ``IndexRange`` bounds used in the
parallel dispatch.

6 changes: 6 additions & 0 deletions doc/sphinx/src/parthenon_arrays.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ where ``LayoutWrapper`` is currently hardcoded to
with the default execution space. If UVM is enabled, it is
``Kokkos::CudaUVMSpace``.

The above ``ParArray#D``s are also aliased with an integral template parameter

.. code:: c++
ParArray3D<Real> myArray("a 3d array", 6, 5, 4);
ParArray<3, Real> myArray2("same type as myArray", 6, 5, 4);

Parthenon Arbitrary-Dimensional Arrays
======================================

Expand Down
2 changes: 1 addition & 1 deletion src/bvals/neighbor_block.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ NeighborBlock::NeighborBlock()
offsets(0, 0, 0), ownership(true) {}

NeighborBlock::NeighborBlock(Mesh *mesh, LogicalLocation loc, LogicalLocation origin_loc,
int rank, int gid, std::array<int, 3> offsets_in, int bid,
int rank, int gid, Kokkos::Array<int, 3> offsets_in, int bid,
Yurlungur marked this conversation as resolved.
Show resolved Hide resolved
int target_id, int fi1, int fi2)
: rank{rank}, gid{gid}, bufid{bid}, targetid{target_id}, loc{loc},
origin_loc{origin_loc}, fi1{fi1}, fi2{fi2}, block_size(mesh->GetBlockSize(loc)),
Expand Down
4 changes: 2 additions & 2 deletions src/bvals/neighbor_block.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ struct NeighborBlock {

NeighborBlock();
NeighborBlock(Mesh *mesh, LogicalLocation loc, LogicalLocation origin_loc, int rank,
int gid, std::array<int, 3> offsets, int bid, int target_id, int ifi1,
int ifi2);
int gid, Kokkos::Array<int, 3> offsets_in, int bid, int target_id,
int fi1, int fi2);
};

//----------------------------------------------------------------------------------------
Expand Down
Loading