Skip to content

Commit

Permalink
[Breaking Change] Tasking rewrite (#987)
Browse files Browse the repository at this point in the history
* trying to use new tasks

* remove debugging

* formatting

* remove raw mpi.hpp include

* style

* more style

* and more style

* ok thats enough

* actually remove the old task stuff

* formatting

* maybe last style commit...

* oops, includes inside parthenon namespace

* update TaskID unit test

* missing header

* port the poisson examples

* try to fix serial builds

* clean up branching in `|` operator of TaskID

Co-authored-by: Jonah Miller <[email protected]>

* rename Queue ThreadQueue

* formatting

* try to fix builds with threads

* update tasking docs

* formatting and update changelog

* address review comments

* style

* add a comment about the dependent variable in Task

* address review comments

* add TaskQualifier to driver prelude

* move using statement

* fix bug in ThreadQueue

* set final_residual in gmg and bicgstab even if they exit by reaching the max iterations

* fix serial case for tasks marked completion and global_sync

---------

Co-authored-by: Jonah Miller <[email protected]>
  • Loading branch information
jdolence and Yurlungur authored Jan 24, 2024
1 parent 4899b6c commit 7855248
Show file tree
Hide file tree
Showing 26 changed files with 1,008 additions and 1,270 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR 987]](https://github.com/parthenon-hpc-lab/parthenon/pull/987) New tasking infrastructure and capabilities
- [[PR 969]](https://github.com/parthenon-hpc-lab/parthenon/pull/969) New macro-based auto-naming of profiling regions and kernels
- [[PR 981]](https://github.com/parthenon-hpc-lab/parthenon/pull/981) Add IndexSplit
- [[PR 983]](https://github.com/parthenon-hpc-lab/parthenon/pull/983) Add Contains to SparsePack
Expand All @@ -23,6 +24,7 @@
### Removed (removing behavior/API/varaibles/...)

### Incompatibilities (i.e. breaking changes)
- [[PR 987]](https://github.com/parthenon-hpc-lab/parthenon/pull/987) Change the API for what was IterativeTasks
- [[PR 974]](https://github.com/parthenon-hpc-lab/parthenon/pull/974) Change GetParentPointer to always return T*


Expand Down
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,9 @@ endif()
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
find_package(Filesystem REQUIRED COMPONENTS Experimental Final)

# Require threading for tasks
find_package(Threads)

set(ENABLE_MPI OFF)
set(NUM_MPI_PROC_TESTING "4" CACHE STRING "Number of mpi processors to use when running tests with MPI")
if (NOT PARTHENON_DISABLE_MPI)
Expand Down
210 changes: 120 additions & 90 deletions doc/sphinx/src/tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,85 +3,84 @@
Tasks
=====

Parthenon's tasking infrastructure is how downstream applications describe
and execute their work. Tasks are organized into a hierarchy of objects.
``TaskCollection``s have one or more ``TaskRegion``s, ``TaskRegion``s have
one or more ``TaskList``s, and ``TaskList``s can have one or more sublists
(that are themselves ``TaskList``s).
Task
----
Though downstream codes never have to interact with the ``Task`` object directly,
it's useful to describe nonetheless. A ``Task`` object is essentially a functor
that stores the necessary data to invoke a downstream code's functions with
the desired arguments. Importantly, however, it also stores information that
relates itself to other tasks, namely the tasks that must be complete before
it should execute and the tasks that may be available to run after it completes.
In other words, ``Task``s are nodes in a directed (possibly cyclic) graph, and
include the edges that connect to it and emerge from it.
TaskList
--------
The ``TaskList`` class implements methods to build and execute a set of
tasks with associated dependencies. The class implements a few public
facing member functions that provide useful functionality for downstream
apps:

AddTask
~~~~~~~

``AddTask`` is a templated variadic function that takes the task
function to be executed, the task dependencies (see ``TaskID`` below),
and the arguments to the task function as it’s arguments. All arguments
are captured by value in a lambda for later execution.

When adding functions that are non-static class member functions, a
slightly different interface is required. The first argument should be
the class-name-scoped name of the function. For example, for a function
named ``DoSomething`` in class ``SomeClass``, the first argument would
be ``&SomeClass::DoSomething``. The second argument should be a pointer
to the object that should invoke this member function. Finally, the
dependencies and function arguments should be provided as described
above.

Examples of both ``AddTask`` calls can be found in the advection example
`here <https://github.com/parthenon-hpc-lab/parthenon/blob/develop/example/advection/advection_driver.cpp>`__.

AddIteration
~~~~~~~~~~~~

``AddIteration`` provides a means of grouping a set of tasks together
that will be executed repeatedly until stopping criteria are satisfied.
``AddIteration`` returns an ``IterativeTasks`` object which provides
overloaded ``AddTask`` functions as described above, but internally
handles the bookkeeping necessary to maintain the association of all the
tasks associated with the iterative process. A special function
``SetCompletionTask``, which behaves identically to ``AddTask``, allows
a task to be defined that evaluates the stopping criteria. The maximum
number of iterations can be controlled through the ``SetMaxIterations``
member function and the number of iterations between evaluating the
stopping criteria can be set with the ``SetCheckInterval`` function.

DoAvailable
~~~~~~~~~~~

``DoAvailable`` loops over the task list once, executing all tasks whose
dependencies are satisfied. Completed tasks are removed from the task
list.

TaskID
------

The ``TaskID`` class implements methods that allow Parthenon to keep
track of tasks, their dependencies, and what remains to be completed.
The main way application code will interact with this object is as a
returned object from ``TaskList::AddTask`` and as an argument to
subsequent calls to ``TaskList::AddTask`` as a dependency for other
tasks. When used as a dependency, ``TaskID`` objects can be combined
with the bitwise or operator (``|``) to specify multiple dependencies.
The ``TaskList`` class stores a vector of all the tasks and sublists (a nested
``TaskList``) added to it. Additionally, it stores various bookkeeping
information that facilitate more advanced features described below. Adding
tasks and sublists are the only way to interact with ``TaskList`` objects.

The basic call to ``AddTask`` takes the task's dependencies, the function to be
executed, and the arguments to the function as its arguments. ``AddTask`` returns
a ``TaskID`` object that can be used in subsequent calls to ``AddTask`` as a
dependency either on its own or combined with other ``TaskID``s via the ``|``
operator. Use of the ``|`` operator is historical and perhaps a bit misleading as
it really acts as a logical and -- that is, all tasks combined with ``|`` must be
complete before the dependencies are satisfied. An overload of ``AddTask`` takes
a ``TaskQualifier`` object as the first argument which specifies certain special,
non-default behaviors. These will be described below. Note that the default
constructor of ``TaskID`` produces a special object that when passed into
``AddTask`` signifies that the task has no dependencies.

The ``AddSublist`` function adds a nested ``TaskList`` to the ``TaskList`` on
which its called. The principle use case for this is to add iterative cycles
to the graph, allowing one to execute a series of tasks repeatedly until some
criteria are satisfied. The call takes as arguments the dependencies (via
``TaskID``s combined with ``|``) that must be complete before the sublist
exectues and a ``std::pair<int, int>`` specifying the minimum
and maximum number of times the sublist should execute. Passing something like
``{min_iters, max_iters}`` as the second argument should suffice, with `{1, 1}`
leading to a sublist that never cycles. ``AddSublist``
returns a ``std::pair<TaskList&, TaskID>`` which is conveniently accessed via
a structured binding, e.g.
.. code:: cpp
TaskID none;
auto [child_list, child_list_id] = parent_list.AddSublist(dependencies, {1,3});
auto task_id = child_list.AddTask(none, SomeFunction, arg1, arg2);
In the above example, passing ``none`` as the dependency for the task added to
``child_list`` does not imply that this task can execute at any time since
``child_list`` itself has dependencies that must be satisfied before any of its
tasks can be invoked.

TaskRegion
----------

``TaskRegion`` is a lightweight class that wraps
``std::vector<TaskList>``, providing a little extra functionality.
During task execution (described below), all task lists in a
``TaskRegion`` can be operated on concurrently. For example, a
``TaskRegion`` can be used to construct independent task lists for each
``MeshBlock``. Occasionally, it is useful to have a task not be
considered complete until that task completes in all lists of a region.
For example, a global iterative solver cannot be considered complete
until the stopping criteria are satisfied everywhere, which may require
evaluating those criteria in tasks that live in different lists within a
region. An example of this use case is
shown `here <https://github.com/parthenon-hpc-lab/parthenon/blob/develop/example/poisson/poisson_driver.cpp>`__. The mechanism
to mark a task so that dependent tasks will wait until all lists have
completed it is to call ``AddRegionalDependencies``, as shown in the
Poisson example.
Under the hood, a ``TaskRegion`` is a directed, possibly cyclic graph. The graph
is built up incrementally as tasks are added to the ``TaskList``s within the
``TaskRegion``, and it's construction is completed upon the first time it's
executed. ``TaskRegion``s can have one or more ``TaskList``s. The primary reason
for this is to allow flexibility in how work is broken up into tasks (and
eventually kernels). A region with many lists will produce many small
tasks/kernels, but may expose more asynchrony (e.g. MPI communication). A region
with fewer lists will produce more work per kernel (which may be good for GPUs,
for example), but may limit asynchrony. Typically, each list is tied to a unique
partition of the mesh blocks owned by a rank. ``TaskRegion`` only provides a few
public facing functions:
- ``TaskListStatus Execute(ThreadPool &pool)``: ``TaskRegion``s can be executed, requiring a
``ThreadPool`` be provided by the caller. In practice, ``Execute`` is usually
called from the ``Execute`` member function of ``TaskCollection``.
- ``TaskList& operator[](const int i)``: return a reference to the ``i``th
``TaskList`` in the region.
- ``size_t size()``: return the number of ``TaskList``s in the region.
TaskCollection
--------------
Expand Down Expand Up @@ -120,21 +119,52 @@ is shown below.
.. figure:: figs/TaskDiagram.png
:alt: Task Diagram

``TaskCollection`` provides two member functions, ``AddRegion`` and
``Execute``.

AddRegion
~~~~~~~~~

``AddRegion`` simply adds a new ``TaskRegion`` to the back of the
collection and returns it as a reference. The integer argument
determines how many task lists make up the region.

Execute
~~~~~~~

Calling the ``Execute`` method on the ``TaskCollection`` executes all
the tasks that have been added to the collection, processing each
``TaskRegion`` in the order they were added, and allowing tasks in
different ``TaskList``\ s but the same ``TaskRegion`` to be executed
concurrently.
``TaskCollection`` provides a few
public-facing functions:
- ``TaskRegion& AddRegion(const int num_lists)``: Add and return a reference to
a new ``TaskRegion`` with the specified number of ``TaskList``s.
- ``TaskListStatus Execute(ThreadPool &pool)``: Execute all regions in the
collection. Regions are executed completely, in the order they were added,
before moving on to the next region. Task execution will take advantage of
the provided ``ThreadPool`` to (possibly) execute tasks across ``TaskList``s
in each region concurrently.
- ``TaskListStatus Execute()``: Same as above, but execution will use an
internally generated ``ThreadPool`` with a single thread.

NOTE: Work remains to make the rest of
Parthenon thread-safe, so it is currently required to use a ``ThreadPool``
with one thread.

TaskQualifier
-------------

``TaskQualifier``s provide a mechanism for downstream codes to alter the default
behavior of specific tasks in certain ways. The qualifiers are described below:
- ``TaskQualifier::local_sync``: Tasks marked with ``local_sync`` synchronize across
lists in a region on a given MPI rank. Tasks that depend on a ``local_sync``
marked task gain dependencies from the corresponding task on all lists within
a region. A typical use for this qualifier is to do a rank-local reduction, for
example before initiating a global MPI reduction (which should be done only once
per rank, not once per ``TaskList``). Note that Parthenon links tasks across
lists in the order they are added to each list, i.e. the ``n``th ``local_sync`` task
in a list is assumed to be associated with the ``n``th ``local_sync`` task in all
lists in the region.
- ``TaskQualifier::global_sync``: Tasks marked with ``global_sync`` implicitly have
the same semantics as ``local_sync``, but additionally do a global reduction on the
``TaskStatus`` to determine if/when execution can proceed on to dependent tasks.
- ``TaskQualifier::completion``: Tasks marked with ``completion`` can lead to exiting
execution of the owning ``TaskList``. If these tasks return ``TaskStatus::complete``
and the minimum number of iterations of the list have been completed, the remainder
of the task list will be skipped (or the iteration stopped). Returning
``TaskList::iterate`` leads to continued execution/iteration, unless the maximum
number of iterations has been reached.
- ``TaskQualifier::once_per_region``: Tasks with the ``once_per_region`` qualifier
will only execute once (per iteration, if relevant) regardless of the number of
``TaskList``s in the region. This can be useful when, for example, doing MPI
reductions, printing out some rank-wide state, or calling a ``completion`` task
that depends on some global condition where all lists would evaluate identical code.

``TaskQualifier``s can be combined via the ``|`` operator and all combinations are
supported. For example, you might mark a task ``global_sync | completion | once_per_region``
if it were a task to determine whether an iteration should continue that depended
on some previously reduced quantity.
Loading

0 comments on commit 7855248

Please sign in to comment.