diff --git a/CHANGELOG.md b/CHANGELOG.md
index cdfa0c44eb38..38e742cf236b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,6 +3,7 @@
 ## Current develop
 
 ### Added (new features/APIs/variables/...)
+- [[PR 987]](https://github.com/parthenon-hpc-lab/parthenon/pull/987) New tasking infrastructure and capabilities
 - [[PR 969]](https://github.com/parthenon-hpc-lab/parthenon/pull/969) New macro-based auto-naming of profiling regions and kernels
 - [[PR 981]](https://github.com/parthenon-hpc-lab/parthenon/pull/981) Add IndexSplit
 - [[PR 983]](https://github.com/parthenon-hpc-lab/parthenon/pull/983) Add Contains to SparsePack
@@ -23,6 +24,7 @@
 ### Removed (removing behavior/API/varaibles/...)
 
 ### Incompatibilities (i.e. breaking changes)
+- [[PR 987]](https://github.com/parthenon-hpc-lab/parthenon/pull/987) Change the API for what was IterativeTasks
 - [[PR 974]](https://github.com/parthenon-hpc-lab/parthenon/pull/974) Change GetParentPointer to always return T*
 
 
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 994399d4bdc0..0f6b0695c5e9 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -116,6 +116,9 @@ endif()
 list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
 find_package(Filesystem REQUIRED COMPONENTS Experimental Final)
 
+# Require threading for tasks
+find_package(Threads)
+
 set(ENABLE_MPI OFF)
 set(NUM_MPI_PROC_TESTING "4" CACHE STRING "Number of mpi processors to use when running tests with MPI")
 if (NOT PARTHENON_DISABLE_MPI)
diff --git a/doc/sphinx/src/tasks.rst b/doc/sphinx/src/tasks.rst
index d4c0b361b7f9..4076d4136095 100644
--- a/doc/sphinx/src/tasks.rst
+++ b/doc/sphinx/src/tasks.rst
@@ -3,85 +3,84 @@
 Tasks
 =====
 
+Parthenon's tasking infrastructure is how downstream applications describe 
+and execute their work.  Tasks are organized into a hierarchy of objects.
+``TaskCollection``s have one or more ``TaskRegion``s, ``TaskRegion``s have
+one or more ``TaskList``s, and ``TaskList``s can have one or more sublists
+(that are themselves ``TaskList``s).
+
+Task
+----
+
+Though downstream codes never have to interact with the ``Task`` object directly,
+it's useful to describe nonetheless.  A ``Task`` object is essentially a functor
+that stores the necessary data to invoke a downstream code's functions with
+the desired arguments.  Importantly, however, it also stores information that
+relates itself to other tasks, namely the tasks that must be complete before
+it should execute and the tasks that may be available to run after it completes.
+In other words, ``Task``s are nodes in a directed (possibly cyclic) graph, and
+include the edges that connect to it and emerge from it.
+
 TaskList
 --------
 
-The ``TaskList`` class implements methods to build and execute a set of
-tasks with associated dependencies. The class implements a few public
-facing member functions that provide useful functionality for downstream
-apps:
-
-AddTask
-~~~~~~~
-
-``AddTask`` is a templated variadic function that takes the task
-function to be executed, the task dependencies (see ``TaskID`` below),
-and the arguments to the task function as it’s arguments. All arguments
-are captured by value in a lambda for later execution.
-
-When adding functions that are non-static class member functions, a
-slightly different interface is required. The first argument should be
-the class-name-scoped name of the function. For example, for a function
-named ``DoSomething`` in class ``SomeClass``, the first argument would
-be ``&SomeClass::DoSomething``. The second argument should be a pointer
-to the object that should invoke this member function. Finally, the
-dependencies and function arguments should be provided as described
-above.
-
-Examples of both ``AddTask`` calls can be found in the advection example
-`here <https://github.com/parthenon-hpc-lab/parthenon/blob/develop/example/advection/advection_driver.cpp>`__.
-
-AddIteration
-~~~~~~~~~~~~
-
-``AddIteration`` provides a means of grouping a set of tasks together
-that will be executed repeatedly until stopping criteria are satisfied.
-``AddIteration`` returns an ``IterativeTasks`` object which provides
-overloaded ``AddTask`` functions as described above, but internally
-handles the bookkeeping necessary to maintain the association of all the
-tasks associated with the iterative process. A special function
-``SetCompletionTask``, which behaves identically to ``AddTask``, allows
-a task to be defined that evaluates the stopping criteria. The maximum
-number of iterations can be controlled through the ``SetMaxIterations``
-member function and the number of iterations between evaluating the
-stopping criteria can be set with the ``SetCheckInterval`` function.
-
-DoAvailable
-~~~~~~~~~~~
-
-``DoAvailable`` loops over the task list once, executing all tasks whose
-dependencies are satisfied. Completed tasks are removed from the task
-list.
-
-TaskID
-------
-
-The ``TaskID`` class implements methods that allow Parthenon to keep
-track of tasks, their dependencies, and what remains to be completed.
-The main way application code will interact with this object is as a
-returned object from ``TaskList::AddTask`` and as an argument to
-subsequent calls to ``TaskList::AddTask`` as a dependency for other
-tasks. When used as a dependency, ``TaskID`` objects can be combined
-with the bitwise or operator (``|``) to specify multiple dependencies.
+The ``TaskList`` class stores a vector of all the tasks and sublists (a nested
+``TaskList``) added to it.  Additionally, it stores various bookkeeping
+information that facilitate more advanced features described below.  Adding
+tasks and sublists are the only way to interact with ``TaskList`` objects.
+
+The basic call to ``AddTask`` takes the task's dependencies, the function to be
+executed, and the arguments to the function as its arguments.  ``AddTask`` returns
+a ``TaskID`` object that can be used in subsequent calls to ``AddTask`` as a
+dependency either on its own or combined with other ``TaskID``s via the ``|``
+operator.  Use of the ``|`` operator is historical and perhaps a bit misleading as
+it really acts as a logical and -- that is, all tasks combined with ``|`` must be
+complete before the dependencies are satisfied.  An overload of ``AddTask`` takes
+a ``TaskQualifier`` object as the first argument which specifies certain special,
+non-default behaviors.  These will be described below.  Note that the default
+constructor of ``TaskID`` produces a special object that when passed into
+``AddTask`` signifies that the task has no dependencies.
+
+The ``AddSublist`` function adds a nested ``TaskList`` to the ``TaskList`` on
+which its called.  The principle use case for this is to add iterative cycles
+to the graph, allowing one to execute a series of tasks repeatedly until some
+criteria are satisfied.  The call takes as arguments the dependencies (via
+``TaskID``s combined with ``|``) that must be complete before the sublist
+exectues and a ``std::pair<int, int>`` specifying the minimum
+and maximum number of times the sublist should execute.  Passing something like
+``{min_iters, max_iters}`` as the second argument should suffice, with `{1, 1}`
+leading to a sublist that never cycles.  ``AddSublist``
+returns a ``std::pair<TaskList&, TaskID>`` which is conveniently accessed via
+a structured binding, e.g.
+.. code:: cpp
+  TaskID none;
+  auto [child_list, child_list_id] = parent_list.AddSublist(dependencies, {1,3});
+  auto task_id = child_list.AddTask(none, SomeFunction, arg1, arg2);
+In the above example, passing ``none`` as the dependency for the task added to
+``child_list`` does not imply that this task can execute at any time since
+``child_list`` itself has dependencies that must be satisfied before any of its
+tasks can be invoked.
 
 TaskRegion
 ----------
 
-``TaskRegion`` is a lightweight class that wraps
-``std::vector<TaskList>``, providing a little extra functionality.
-During task execution (described below), all task lists in a
-``TaskRegion`` can be operated on concurrently. For example, a
-``TaskRegion`` can be used to construct independent task lists for each
-``MeshBlock``. Occasionally, it is useful to have a task not be
-considered complete until that task completes in all lists of a region.
-For example, a global iterative solver cannot be considered complete
-until the stopping criteria are satisfied everywhere, which may require
-evaluating those criteria in tasks that live in different lists within a
-region. An example of this use case is
-shown `here <https://github.com/parthenon-hpc-lab/parthenon/blob/develop/example/poisson/poisson_driver.cpp>`__. The mechanism
-to mark a task so that dependent tasks will wait until all lists have
-completed it is to call ``AddRegionalDependencies``, as shown in the
-Poisson example.
+Under the hood, a ``TaskRegion`` is a directed, possibly cyclic graph.  The graph
+is built up incrementally as tasks are added to the ``TaskList``s within the 
+``TaskRegion``, and it's construction is completed upon the first time it's
+executed.  ``TaskRegion``s can have one or more ``TaskList``s.  The primary reason
+for this is to allow flexibility in how work is broken up into tasks (and
+eventually kernels).  A region with many lists will produce many small
+tasks/kernels, but may expose more asynchrony (e.g. MPI communication).  A region
+with fewer lists will produce more work per kernel (which may be good for GPUs,
+for example), but may limit asynchrony.  Typically, each list is tied to a unique
+partition of the mesh blocks owned by a rank.  ``TaskRegion`` only provides a few
+public facing functions:
+- ``TaskListStatus Execute(ThreadPool &pool)``: ``TaskRegion``s can be executed, requiring a
+``ThreadPool`` be provided by the caller.  In practice, ``Execute`` is usually
+called from the ``Execute`` member function of ``TaskCollection``.
+- ``TaskList& operator[](const int i)``: return a reference to the ``i``th
+``TaskList`` in the region.
+- ``size_t size()``: return the number of ``TaskList``s in the region.
 
 TaskCollection
 --------------
@@ -120,21 +119,52 @@ is shown below.
 .. figure:: figs/TaskDiagram.png
    :alt: Task Diagram
 
-``TaskCollection`` provides two member functions, ``AddRegion`` and
-``Execute``.
-
-AddRegion
-~~~~~~~~~
-
-``AddRegion`` simply adds a new ``TaskRegion`` to the back of the
-collection and returns it as a reference. The integer argument
-determines how many task lists make up the region.
-
-Execute
-~~~~~~~
-
-Calling the ``Execute`` method on the ``TaskCollection`` executes all
-the tasks that have been added to the collection, processing each
-``TaskRegion`` in the order they were added, and allowing tasks in
-different ``TaskList``\ s but the same ``TaskRegion`` to be executed
-concurrently.
+``TaskCollection`` provides a few 
+public-facing functions:
+- ``TaskRegion& AddRegion(const int num_lists)``: Add and return a reference to
+a new ``TaskRegion`` with the specified number of ``TaskList``s.
+- ``TaskListStatus Execute(ThreadPool &pool)``: Execute all regions in the
+collection.  Regions are executed completely, in the order they were added,
+before moving on to the next region.  Task execution will take advantage of
+the provided ``ThreadPool`` to (possibly) execute tasks across ``TaskList``s
+in each region concurrently.
+- ``TaskListStatus Execute()``: Same as above, but execution will use an
+internally generated ``ThreadPool`` with a single thread.
+
+NOTE: Work remains to make the rest of
+Parthenon thread-safe, so it is currently required to use a ``ThreadPool``
+with one thread.
+
+TaskQualifier
+-------------
+
+``TaskQualifier``s provide a mechanism for downstream codes to alter the default
+behavior of specific tasks in certain ways.  The qualifiers are described below:
+- ``TaskQualifier::local_sync``: Tasks marked with ``local_sync`` synchronize across
+lists in a region on a given MPI rank.  Tasks that depend on a ``local_sync``
+marked task gain dependencies from the corresponding task on all lists within
+a region.  A typical use for this qualifier is to do a rank-local reduction, for
+example before initiating a global MPI reduction (which should be done only once
+per rank, not once per ``TaskList``).  Note that Parthenon links tasks across
+lists in the order they are added to each list, i.e. the ``n``th ``local_sync`` task
+in a list is assumed to be associated with the ``n``th ``local_sync`` task in all
+lists in the region.
+- ``TaskQualifier::global_sync``: Tasks marked with ``global_sync`` implicitly have
+the same semantics as ``local_sync``, but additionally do a global reduction on the
+``TaskStatus`` to determine if/when execution can proceed on to dependent tasks.
+- ``TaskQualifier::completion``: Tasks marked with ``completion`` can lead to exiting
+execution of the owning ``TaskList``.  If these tasks return ``TaskStatus::complete``
+and the minimum number of iterations of the list have been completed, the remainder
+of the task list will be skipped (or the iteration stopped).  Returning
+``TaskList::iterate`` leads to continued execution/iteration, unless the maximum
+number of iterations has been reached.
+- ``TaskQualifier::once_per_region``: Tasks with the ``once_per_region`` qualifier
+will only execute once (per iteration, if relevant) regardless of the number of
+``TaskList``s in the region.  This can be useful when, for example, doing MPI
+reductions, printing out some rank-wide state, or calling a ``completion`` task
+that depends on some global condition where all lists would evaluate identical code.
+
+``TaskQualifier``s can be combined via the ``|`` operator and all combinations are
+supported.  For example, you might mark a task ``global_sync | completion | once_per_region``
+if it were a task to determine whether an iteration should continue that depended
+on some previously reduced quantity.
diff --git a/example/poisson/poisson_driver.cpp b/example/poisson/poisson_driver.cpp
index a94f6874ab70..e2ec03a354d3 100644
--- a/example/poisson/poisson_driver.cpp
+++ b/example/poisson/poisson_driver.cpp
@@ -70,9 +70,7 @@ TaskCollection PoissonDriver::MakeTaskCollection(BlockList_t &blocks) {
   // and a kokkos view just for fun
   AllReduce<HostArray1D<Real>> *pview_reduce =
       pkg->MutableParam<AllReduce<HostArray1D<Real>>>("view_reduce");
-  int reg_dep_id;
   for (int i = 0; i < num_partitions; i++) {
-    reg_dep_id = 0;
     // make/get a mesh_data container for the state
     auto &md = pmesh->mesh_data.GetOrAdd("base", i);
     auto &mdelta = pmesh->mesh_data.GetOrAdd("delta", i);
@@ -81,101 +79,83 @@ TaskCollection PoissonDriver::MakeTaskCollection(BlockList_t &blocks) {
 
     //--- Demo a few reductions
     // pass a pointer to the variable being reduced into
-    auto loc_red = tl.AddTask(none, poisson_package::SumMass<MeshData<Real>>, md.get(),
-                              &total_mass.val);
-    // make it a regional dependency so dependent tasks can't execute until all lists do
-    // this
-    solver_region.AddRegionalDependencies(reg_dep_id, i, loc_red);
-    reg_dep_id++;
+    auto loc_red =
+        tl.AddTask(TaskQualifier::local_sync, none,
+                   poisson_package::SumMass<MeshData<Real>>, md.get(), &total_mass.val);
 
     auto rank_red = tl.AddTask(
-        none,
+        TaskQualifier::local_sync, none,
         [](int *max_rank) {
           *max_rank = std::max(*max_rank, Globals::my_rank);
           return TaskStatus::complete;
         },
         &max_rank.val);
-    solver_region.AddRegionalDependencies(reg_dep_id, i, rank_red);
-    reg_dep_id++;
 
     // start a non-blocking MPI_Iallreduce
     auto start_global_reduce =
-        (i == 0 ? tl.AddTask(loc_red, &AllReduce<Real>::StartReduce, &total_mass, MPI_SUM)
-                : none);
+        tl.AddTask(TaskQualifier::once_per_region, loc_red, &AllReduce<Real>::StartReduce,
+                   &total_mass, MPI_SUM);
 
-    auto start_rank_reduce =
-        (i == 0 ? tl.AddTask(rank_red, &Reduce<int>::StartReduce, &max_rank, 0, MPI_MAX)
-                : none);
+    auto start_rank_reduce = tl.AddTask(TaskQualifier::once_per_region, rank_red,
+                                        &Reduce<int>::StartReduce, &max_rank, 0, MPI_MAX);
 
     // test the reduction until it completes
     auto finish_global_reduce =
-        tl.AddTask(start_global_reduce, &AllReduce<Real>::CheckReduce, &total_mass);
-    solver_region.AddRegionalDependencies(reg_dep_id, i, finish_global_reduce);
-    reg_dep_id++;
+        tl.AddTask(TaskQualifier::local_sync | TaskQualifier::once_per_region,
+                   start_global_reduce, &AllReduce<Real>::CheckReduce, &total_mass);
 
     auto finish_rank_reduce =
-        tl.AddTask(start_rank_reduce, &Reduce<int>::CheckReduce, &max_rank);
-    solver_region.AddRegionalDependencies(reg_dep_id, i, finish_rank_reduce);
-    reg_dep_id++;
+        tl.AddTask(TaskQualifier::local_sync | TaskQualifier::once_per_region,
+                   start_rank_reduce, &Reduce<int>::CheckReduce, &max_rank);
 
     // notice how we must always pass a pointer to the reduction value
     // since tasks capture args by value, this would print zero if we just passed in
     // the val since the tasks that compute the value haven't actually executed yet
-    auto report_mass = (i == 0 && Globals::my_rank == 0
-                            ? tl.AddTask(
-                                  finish_global_reduce,
-                                  [](Real *mass) {
-                                    std::cout << "Total mass = " << *mass << std::endl;
-                                    return TaskStatus::complete;
-                                  },
-                                  &total_mass.val)
-                            : none);
-    auto report_rank = (i == 0 && Globals::my_rank == 0
-                            ? tl.AddTask(
-                                  finish_rank_reduce,
-                                  [](int *max_rank) {
-                                    std::cout << "Max rank = " << *max_rank << std::endl;
-                                    return TaskStatus::complete;
-                                  },
-                                  &max_rank.val)
-                            : none);
+    auto report_mass = tl.AddTask(
+        TaskQualifier::once_per_region, finish_global_reduce,
+        [](Real *mass) {
+          if (Globals::my_rank == 0) std::cout << "Total mass = " << *mass << std::endl;
+          return TaskStatus::complete;
+        },
+        &total_mass.val);
+    auto report_rank = tl.AddTask(
+        TaskQualifier::once_per_region, finish_rank_reduce,
+        [](int *max_rank) {
+          if (Globals::my_rank == 0) std::cout << "Max rank = " << *max_rank << std::endl;
+          return TaskStatus::complete;
+        },
+        &max_rank.val);
 
     //--- Begining of tasks related to solving the Poisson eq.
     auto mat_elem =
         tl.AddTask(none, poisson_package::SetMatrixElements<MeshData<Real>>, md.get());
 
-    auto &solver = tl.AddIteration("poisson solver");
-    solver.SetMaxIterations(max_iters);
-    solver.SetCheckInterval(check_interval);
-    solver.SetFailWithMaxIterations(fail_flag);
-    solver.SetWarnWithMaxIterations(warn_flag);
+    auto [solver, solver_id] = tl.AddSublist(mat_elem, {1, max_iters});
 
     auto start_recv = solver.AddTask(none, parthenon::StartReceiveBoundaryBuffers, md);
 
-    auto update = solver.AddTask(mat_elem, poisson_package::UpdatePhi<MeshData<Real>>,
+    auto update = solver.AddTask(none, poisson_package::UpdatePhi<MeshData<Real>>,
                                  md.get(), mdelta.get());
 
-    auto norm = solver.AddTask(update, poisson_package::SumDeltaPhi<MeshData<Real>>,
-                               mdelta.get(), &update_norm.val);
-    solver_region.AddRegionalDependencies(reg_dep_id, i, norm);
-    reg_dep_id++;
-    auto start_reduce_norm = (i == 0 ? solver.AddTask(norm, &AllReduce<Real>::StartReduce,
-                                                      &update_norm, MPI_SUM)
-                                     : none);
+    auto norm = solver.AddTask(TaskQualifier::local_sync, update,
+                               poisson_package::SumDeltaPhi<MeshData<Real>>, mdelta.get(),
+                               &update_norm.val);
+    auto start_reduce_norm =
+        solver.AddTask(TaskQualifier::once_per_region, norm,
+                       &AllReduce<Real>::StartReduce, &update_norm, MPI_SUM);
     auto finish_reduce_norm =
-        solver.AddTask(start_reduce_norm, &AllReduce<Real>::CheckReduce, &update_norm);
-    auto report_norm = (i == 0 ? solver.AddTask(
-                                     finish_reduce_norm,
-                                     [](Real *norm) {
-                                       if (Globals::my_rank == 0) {
-                                         std::cout << "Update norm = " << *norm
-                                                   << std::endl;
-                                       }
-                                       *norm = 0.0;
-                                       return TaskStatus::complete;
-                                     },
-                                     &update_norm.val)
-                               : none);
+        solver.AddTask(TaskQualifier::once_per_region, start_reduce_norm,
+                       &AllReduce<Real>::CheckReduce, &update_norm);
+    auto report_norm = solver.AddTask(
+        TaskQualifier::once_per_region, finish_reduce_norm,
+        [](Real *norm) {
+          if (Globals::my_rank == 0) {
+            std::cout << "Update norm = " << *norm << std::endl;
+          }
+          *norm = 0.0;
+          return TaskStatus::complete;
+        },
+        &update_norm.val);
 
     auto send = solver.AddTask(update, SendBoundaryBuffers, md);
 
@@ -183,24 +163,18 @@ TaskCollection PoissonDriver::MakeTaskCollection(BlockList_t &blocks) {
 
     auto setb = solver.AddTask(recv | update, SetBoundaries, md);
 
-    auto check = solver.SetCompletionTask(
-        send | setb | report_norm, poisson_package::CheckConvergence<MeshData<Real>>,
-        md.get(), mdelta.get());
-    // mark task so that dependent tasks (below) won't execute
-    // until all task lists have completed it
-    solver_region.AddRegionalDependencies(reg_dep_id, i, check);
-    reg_dep_id++;
-
-    auto print = none;
-    if (i == 0) { // only print once
-      print = tl.AddTask(check, poisson_package::PrintComplete);
-    }
+    auto check = solver.AddTask(
+        TaskQualifier::completion | TaskQualifier::global_sync, send | setb | report_norm,
+        poisson_package::CheckConvergence<MeshData<Real>>, md.get(), mdelta.get());
+
+    auto print = tl.AddTask(TaskQualifier::once_per_region, solver_id,
+                            poisson_package::PrintComplete);
     //--- End of tasks related to solving the Poisson eq
 
     // do a vector reduction (everything below here), just for fun
     // first fill it in
     auto fill_vec = tl.AddTask(
-        none,
+        TaskQualifier::local_sync, none,
         [](std::vector<int> *vec) {
           auto &v = *vec;
           for (int n = 0; n < v.size(); n++)
@@ -208,72 +182,64 @@ TaskCollection PoissonDriver::MakeTaskCollection(BlockList_t &blocks) {
           return TaskStatus::complete;
         },
         &vec_reduce.val);
-    solver_region.AddRegionalDependencies(reg_dep_id, i, fill_vec);
-    reg_dep_id++;
 
     TaskID start_vec_reduce =
-        (i == 0 ? tl.AddTask(fill_vec, &AllReduce<std::vector<int>>::StartReduce,
-                             &vec_reduce, MPI_SUM)
-                : none);
+        tl.AddTask(TaskQualifier::once_per_region, fill_vec,
+                   &AllReduce<std::vector<int>>::StartReduce, &vec_reduce, MPI_SUM);
     // test the reduction until it completes
     TaskID finish_vec_reduce = tl.AddTask(
-        start_vec_reduce, &AllReduce<std::vector<int>>::CheckReduce, &vec_reduce);
-    solver_region.AddRegionalDependencies(reg_dep_id, i, finish_vec_reduce);
-    reg_dep_id++;
-
-    auto report_vec = (i == 0 && Globals::my_rank == 0
-                           ? tl.AddTask(
-                                 finish_vec_reduce,
-                                 [num_partitions](std::vector<int> *vec) {
-                                   auto &v = *vec;
-                                   std::cout << "Vec reduction: ";
-                                   for (int n = 0; n < v.size(); n++) {
-                                     std::cout << v[n] << " ";
-                                   }
-                                   std::cout << std::endl;
-                                   std::cout << "Should be:     ";
-                                   for (int n = 0; n < v.size(); n++) {
-                                     std::cout << n * num_partitions * Globals::nranks
-                                               << " ";
-                                   }
-                                   std::cout << std::endl;
-                                   return TaskStatus::complete;
-                                 },
-                                 &vec_reduce.val)
-                           : none);
+        TaskQualifier::once_per_region | TaskQualifier::local_sync, start_vec_reduce,
+        &AllReduce<std::vector<int>>::CheckReduce, &vec_reduce);
+
+    auto report_vec = tl.AddTask(
+        TaskQualifier::once_per_region, finish_vec_reduce,
+        [num_partitions](std::vector<int> *vec) {
+          if (Globals::my_rank == 0) {
+            auto &v = *vec;
+            std::cout << "Vec reduction: ";
+            for (int n = 0; n < v.size(); n++) {
+              std::cout << v[n] << " ";
+            }
+            std::cout << std::endl;
+            std::cout << "Should be:     ";
+            for (int n = 0; n < v.size(); n++) {
+              std::cout << n * num_partitions * Globals::nranks << " ";
+            }
+            std::cout << std::endl;
+          }
+          return TaskStatus::complete;
+        },
+        &vec_reduce.val);
 
     // And lets do a view reduce too just for fun
     // The views are filled in the package
     TaskID start_view_reduce =
-        (i == 0 ? tl.AddTask(none, &AllReduce<HostArray1D<Real>>::StartReduce,
-                             pview_reduce, MPI_SUM)
-                : none);
+        tl.AddTask(TaskQualifier::once_per_region, none,
+                   &AllReduce<HostArray1D<Real>>::StartReduce, pview_reduce, MPI_SUM);
     // test the reduction until it completes
     TaskID finish_view_reduce = tl.AddTask(
-        start_view_reduce, &AllReduce<HostArray1D<Real>>::CheckReduce, pview_reduce);
-    solver_region.AddRegionalDependencies(reg_dep_id, i, finish_view_reduce);
-    reg_dep_id++;
-
-    auto report_view = (i == 0 && Globals::my_rank == 0
-                            ? tl.AddTask(
-                                  finish_view_reduce,
-                                  [num_partitions](HostArray1D<Real> *view) {
-                                    auto &v = *view;
-                                    std::cout << "View reduction: ";
-                                    for (int n = 0; n < v.size(); n++) {
-                                      std::cout << v(n) << " ";
-                                    }
-                                    std::cout << std::endl;
-                                    std::cout << "Should be:     ";
-                                    for (int n = 0; n < v.size(); n++) {
-                                      std::cout << n * num_partitions * Globals::nranks
-                                                << " ";
-                                    }
-                                    std::cout << std::endl;
-                                    return TaskStatus::complete;
-                                  },
-                                  &(pview_reduce->val))
-                            : none);
+        TaskQualifier::once_per_region | TaskQualifier::local_sync, start_view_reduce,
+        &AllReduce<HostArray1D<Real>>::CheckReduce, pview_reduce);
+
+    auto report_view = tl.AddTask(
+        TaskQualifier::once_per_region, finish_view_reduce,
+        [num_partitions](HostArray1D<Real> *view) {
+          if (Globals::my_rank == 0) {
+            auto &v = *view;
+            std::cout << "View reduction: ";
+            for (int n = 0; n < v.size(); n++) {
+              std::cout << v(n) << " ";
+            }
+            std::cout << std::endl;
+            std::cout << "Should be:     ";
+            for (int n = 0; n < v.size(); n++) {
+              std::cout << n * num_partitions * Globals::nranks << " ";
+            }
+            std::cout << std::endl;
+          }
+          return TaskStatus::complete;
+        },
+        &(pview_reduce->val));
   }
 
   return tc;
diff --git a/example/poisson_gmg/parthinput.poisson b/example/poisson_gmg/parthinput.poisson
index 7ec0878059ec..57f5febf871b 100644
--- a/example/poisson_gmg/parthinput.poisson
+++ b/example/poisson_gmg/parthinput.poisson
@@ -25,14 +25,14 @@ multigrid = true
 nx1 = 64
 x1min = -1.0
 x1max = 1.0
-ix1_bc = user
-ox1_bc = user
+ix1_bc = outflow
+ox1_bc = outflow
 
 nx2 = 64
 x2min = -1.0
 x2max = 1.0
-ix2_bc = user
-ox2_bc = user
+ix2_bc = outflow
+ox2_bc = outflow
 
 nx3 = 1
 x3min = 0.0
diff --git a/example/poisson_gmg/poisson_driver.cpp b/example/poisson_gmg/poisson_driver.cpp
index 784653237413..656dd3fcac87 100644
--- a/example/poisson_gmg/poisson_driver.cpp
+++ b/example/poisson_gmg/poisson_driver.cpp
@@ -99,11 +99,10 @@ TaskCollection PoissonDriver::MakeTaskCollection(BlockList_t &blocks) {
     auto zero_u = tl.AddTask(get_rhs, solvers::utils::SetToZero<u>, md);
 
     auto solve = zero_u;
-    auto &itl = tl.AddIteration("Solver");
     if (solver == "BiCGSTAB") {
-      solve = bicgstab_solver->AddTasks(tl, itl, zero_u, i, pmesh, region, reg_dep_id);
+      solve = bicgstab_solver->AddTasks(tl, zero_u, pmesh, i);
     } else if (solver == "MG") {
-      solve = mg_solver->AddTasks(tl, itl, zero_u, i, pmesh, region, reg_dep_id);
+      solve = mg_solver->AddTasks(tl, zero_u, pmesh, i);
     } else {
       PARTHENON_FAIL("Unknown solver type.");
     }
@@ -113,8 +112,7 @@ TaskCollection PoissonDriver::MakeTaskCollection(BlockList_t &blocks) {
     if (use_exact_rhs) {
       auto diff = tl.AddTask(solve, solvers::utils::AddFieldsAndStore<exact, u, u>, md,
                              1.0, -1.0);
-      auto get_err =
-          solvers::utils::DotProduct<u, u>(diff, region, tl, i, reg_dep_id, &err, md);
+      auto get_err = solvers::utils::DotProduct<u, u>(diff, tl, &err, md);
       tl.AddTask(
           get_err,
           [](PoissonDriver *driver, int partition) {
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 6d5b67dd207d..d60a37d56355 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -213,10 +213,8 @@ add_library(parthenon
   solvers/mg_solver.hpp
   solvers/solver_utils.hpp
 
-  tasks/task_id.cpp
-  tasks/task_id.hpp
-  tasks/task_list.hpp
-  tasks/task_types.hpp
+  tasks/tasks.hpp
+  tasks/thread_pool.hpp
 
   time_integration/butcher_integrator.cpp
   time_integration/low_storage_integrator.cpp
@@ -312,7 +310,7 @@ if (Kokkos_ENABLE_CUDA AND NOT CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
    target_compile_options(parthenon PUBLIC --expt-relaxed-constexpr)
 endif()
 
-target_link_libraries(parthenon PUBLIC Kokkos::kokkos)
+target_link_libraries(parthenon PUBLIC Kokkos::kokkos Threads::Threads)
 
 if (PARTHENON_ENABLE_ASCENT)
   if (ENABLE_MPI)
diff --git a/src/basic_types.hpp b/src/basic_types.hpp
index 9898140f82b4..4f827b1c05e0 100644
--- a/src/basic_types.hpp
+++ b/src/basic_types.hpp
@@ -45,7 +45,7 @@ using Real = double;
 // X3DIR z, phi, etc...
 enum CoordinateDirection { NODIR = -1, X0DIR = 0, X1DIR = 1, X2DIR = 2, X3DIR = 3 };
 enum class BlockLocation { Left = 0, Center = 1, Right = 2 };
-enum class TaskStatus { fail, complete, incomplete, iterate, skip, waiting };
+enum class TaskStatus { complete, incomplete, iterate };
 
 enum class AmrTag : int { derefine = -1, same = 0, refine = 1 };
 enum class RefinementOp_t { Prolongation, Restriction, None };
diff --git a/src/bvals/comms/boundary_communication.cpp b/src/bvals/comms/boundary_communication.cpp
index 6f455afb9472..cc093a34f080 100644
--- a/src/bvals/comms/boundary_communication.cpp
+++ b/src/bvals/comms/boundary_communication.cpp
@@ -34,8 +34,8 @@
 #include "mesh/mesh_refinement.hpp"
 #include "mesh/meshblock.hpp"
 #include "prolong_restrict/prolong_restrict.hpp"
-#include "tasks/task_id.hpp"
-#include "tasks/task_list.hpp"
+
+#include "tasks/tasks.hpp"
 #include "utils/error_checking.hpp"
 #include "utils/loop_utils.hpp"
 
@@ -374,8 +374,8 @@ template TaskStatus
 ProlongateBounds<BoundaryType::gmg_prolongate_recv>(std::shared_ptr<MeshData<Real>> &);
 
 // Adds all relevant boundary communication to a single task list
-template <BoundaryType bounds, class TL_t>
-TaskID AddBoundaryExchangeTasks(TaskID dependency, TL_t &tl,
+template <BoundaryType bounds>
+TaskID AddBoundaryExchangeTasks(TaskID dependency, TaskList &tl,
                                 std::shared_ptr<MeshData<Real>> &md, bool multilevel) {
   // TODO(LFR): Splitting up the boundary tasks while doing prolongation can cause some
   //            possible issues for sparse fields. In particular, the order in which
@@ -415,13 +415,11 @@ TaskID AddBoundaryExchangeTasks(TaskID dependency, TL_t &tl,
 
   return fbound;
 }
-template TaskID AddBoundaryExchangeTasks<BoundaryType::any, TaskList>(
-    TaskID, TaskList &, std::shared_ptr<MeshData<Real>> &, bool);
-template TaskID AddBoundaryExchangeTasks<BoundaryType::any, IterativeTasks>(
-    TaskID, IterativeTasks &, std::shared_ptr<MeshData<Real>> &, bool);
-
-template TaskID AddBoundaryExchangeTasks<BoundaryType::gmg_same, TaskList>(
-    TaskID, TaskList &, std::shared_ptr<MeshData<Real>> &, bool);
-template TaskID AddBoundaryExchangeTasks<BoundaryType::gmg_same, IterativeTasks>(
-    TaskID, IterativeTasks &, std::shared_ptr<MeshData<Real>> &, bool);
+template TaskID
+AddBoundaryExchangeTasks<BoundaryType::any>(TaskID, TaskList &,
+                                            std::shared_ptr<MeshData<Real>> &, bool);
+
+template TaskID
+AddBoundaryExchangeTasks<BoundaryType::gmg_same>(TaskID, TaskList &,
+                                                 std::shared_ptr<MeshData<Real>> &, bool);
 } // namespace parthenon
diff --git a/src/bvals/comms/bvals_in_one.hpp b/src/bvals/comms/bvals_in_one.hpp
index dbc48efd821b..22b637a48a74 100644
--- a/src/bvals/comms/bvals_in_one.hpp
+++ b/src/bvals/comms/bvals_in_one.hpp
@@ -25,8 +25,8 @@
 #include "basic_types.hpp"
 #include "bvals/bvals_interfaces.hpp"
 #include "coordinates/coordinates.hpp"
-#include "tasks/task_id.hpp"
-#include "tasks/task_list.hpp"
+
+#include "tasks/tasks.hpp"
 #include "utils/object_pool.hpp"
 
 namespace parthenon {
@@ -72,8 +72,8 @@ TaskStatus ReceiveFluxCorrections(std::shared_ptr<MeshData<Real>> &md);
 TaskStatus SetFluxCorrections(std::shared_ptr<MeshData<Real>> &md);
 
 // Adds all relevant boundary communication to a single task list
-template <BoundaryType bounds = BoundaryType::any, class TL_t>
-TaskID AddBoundaryExchangeTasks(TaskID dependency, TL_t &tl,
+template <BoundaryType bounds = BoundaryType::any>
+TaskID AddBoundaryExchangeTasks(TaskID dependency, TaskList &tl,
                                 std::shared_ptr<MeshData<Real>> &md, bool multilevel);
 
 // These tasks should not be called in down stream code
diff --git a/src/defs.hpp b/src/defs.hpp
index ace72628f035..47e8c28b6b78 100644
--- a/src/defs.hpp
+++ b/src/defs.hpp
@@ -107,7 +107,7 @@ struct RegionSize {
 // TODO(felker): C++ Core Guidelines Enum.5: Don’t use ALL_CAPS for enumerators
 // (avoid clashes with preprocessor macros). Enumerated type definitions in this file and:
 // io_wrapper.hpp, bvals.hpp, field_diffusion.hpp,
-// task_list.hpp, ???
+// tasks.hpp, ???
 
 // identifiers for all 6 faces of a MeshBlock
 constexpr int BOUNDARY_NFACES = 6;
diff --git a/src/driver/driver.hpp b/src/driver/driver.hpp
index 96937177312b..929ea19c3f2b 100644
--- a/src/driver/driver.hpp
+++ b/src/driver/driver.hpp
@@ -27,7 +27,7 @@
 #include "mesh/mesh.hpp"
 #include "outputs/outputs.hpp"
 #include "parameter_input.hpp"
-#include "tasks/task_list.hpp"
+#include "tasks/tasks.hpp"
 
 namespace parthenon {
 
diff --git a/src/driver/multistage.hpp b/src/driver/multistage.hpp
index 9cc1fcf5b2c5..d23c894be592 100644
--- a/src/driver/multistage.hpp
+++ b/src/driver/multistage.hpp
@@ -22,7 +22,7 @@
 #include "driver/driver.hpp"
 #include "mesh/mesh.hpp"
 #include "parameter_input.hpp"
-#include "tasks/task_list.hpp"
+#include "tasks/tasks.hpp"
 #include "time_integration/staged_integrator.hpp"
 
 namespace parthenon {
diff --git a/src/parthenon/driver.hpp b/src/parthenon/driver.hpp
index 60dd5f3bd7b5..eced2c6684d3 100644
--- a/src/parthenon/driver.hpp
+++ b/src/parthenon/driver.hpp
@@ -26,9 +26,7 @@
 #include <mesh/meshblock_pack.hpp>
 #include <outputs/outputs.hpp>
 #include <parameter_input.hpp>
-#include <tasks/task_id.hpp>
-#include <tasks/task_list.hpp>
-#include <tasks/task_types.hpp>
+#include <tasks/tasks.hpp>
 #include <utils/partition_stl_containers.hpp>
 #include <utils/reductions.hpp>
 #include <utils/unique_id.hpp>
@@ -66,6 +64,7 @@ using ::parthenon::TaskCollection;
 using ::parthenon::TaskID;
 using ::parthenon::TaskList;
 using ::parthenon::TaskListStatus;
+using ::parthenon::TaskQualifier;
 using ::parthenon::TaskRegion;
 using ::parthenon::TaskStatus;
 using ::parthenon::TaskType;
diff --git a/src/solvers/bicgstab_solver.hpp b/src/solvers/bicgstab_solver.hpp
index f285646d2923..b82d27bd13dd 100644
--- a/src/solvers/bicgstab_solver.hpp
+++ b/src/solvers/bicgstab_solver.hpp
@@ -24,8 +24,8 @@
 #include "kokkos_abstraction.hpp"
 #include "solvers/mg_solver.hpp"
 #include "solvers/solver_utils.hpp"
-#include "tasks/task_id.hpp"
-#include "tasks/task_list.hpp"
+
+#include "tasks/tasks.hpp"
 
 namespace parthenon {
 
@@ -78,10 +78,10 @@ class BiCGSTABSolver {
     pkg->AddField(p::name(), m_no_ghost);
   }
 
-  TaskID AddTasks(TaskList &tl, IterativeTasks &itl, TaskID dependence, int i,
-                  Mesh *pmesh, TaskRegion &region, int &reg_dep_id) {
+  TaskID AddTasks(TaskList &tl, TaskID dependence, Mesh *pmesh, const int partition) {
     using namespace utils;
-    auto &md = pmesh->mesh_data.GetOrAdd("base", i);
+    TaskID none;
+    auto &md = pmesh->mesh_data.GetOrAdd("base", partition);
     iter_counter = 0;
 
     // Initialization: x <- 0, r <- rhs, rhat0 <- rhs,
@@ -91,12 +91,11 @@ class BiCGSTABSolver {
     auto copy_r = tl.AddTask(dependence, CopyData<rhs, r>, md);
     auto copy_p = tl.AddTask(dependence, CopyData<rhs, p>, md);
     auto copy_rhat0 = tl.AddTask(dependence, CopyData<rhs, rhat0>, md);
-    auto get_rhat0r_init =
-        DotProduct<rhat0, r>(dependence, region, tl, i, reg_dep_id, &rhat0r, md);
+    auto get_rhat0r_init = DotProduct<rhat0, r>(dependence, tl, &rhat0r, md);
     auto initialize = tl.AddTask(
+        TaskQualifier::once_per_region | TaskQualifier::local_sync,
         zero_x | zero_u_init | copy_r | copy_p | copy_rhat0 | get_rhat0r_init,
-        [](BiCGSTABSolver *solver, int partition) {
-          if (partition != 0) return TaskStatus::complete;
+        [](BiCGSTABSolver *solver) {
           solver->rhat0r_old = solver->rhat0r.val;
           solver->rhat0r.val = 0.0;
           solver->rhat0v.val = 0.0;
@@ -105,28 +104,25 @@ class BiCGSTABSolver {
           solver->residual.val = 0.0;
           return TaskStatus::complete;
         },
-        this, i);
-    region.AddRegionalDependencies(reg_dep_id, i, initialize);
-    reg_dep_id++;
-    if (i == 0) {
-      tl.AddTask(dependence, [&]() {
-        if (Globals::my_rank == 0)
-          printf("# [0] v-cycle\n# [1] rms-residual\n# [2] rms-error\n");
-        return TaskStatus::complete;
-      });
-    }
+        this);
+    tl.AddTask(TaskQualifier::once_per_region, dependence, [&]() {
+      if (Globals::my_rank == 0)
+        printf("# [0] v-cycle\n# [1] rms-residual\n# [2] rms-error\n");
+      return TaskStatus::complete;
+    });
 
     // BEGIN ITERATIVE TASKS
+    auto [itl, solver_id] = tl.AddSublist(initialize, {1, params_.max_iters});
 
     // 1. u <- M p
-    auto precon1 = initialize;
+    auto precon1 = none;
     if (params_.precondition) {
       auto set_rhs = itl.AddTask(precon1, CopyData<p, rhs>, md);
       auto zero_u = itl.AddTask(precon1, SetToZero<u>, md);
-      precon1 = preconditioner.AddLinearOperatorTasks(region, itl, set_rhs | zero_u, i,
-                                                      reg_dep_id, pmesh);
+      precon1 =
+          preconditioner.AddLinearOperatorTasks(itl, set_rhs | zero_u, partition, pmesh);
     } else {
-      precon1 = itl.AddTask(initialize, CopyData<p, u>, md);
+      precon1 = itl.AddTask(none, CopyData<p, u>, md);
     }
 
     // 2. v <- A u
@@ -134,8 +130,7 @@ class BiCGSTABSolver {
     auto get_v = eqs_.template Ax<u, v>(itl, comm, md);
 
     // 3. rhat0v <- (rhat0, v)
-    auto get_rhat0v =
-        DotProduct<rhat0, v>(get_v, region, itl, i, reg_dep_id, &rhat0v, md);
+    auto get_rhat0v = DotProduct<rhat0, v>(get_v, itl, &rhat0v, md);
 
     // 4. h <- x + alpha u (alpha = rhat0r_old / rhat0v)
     auto correct_h = itl.AddTask(
@@ -156,26 +151,25 @@ class BiCGSTABSolver {
         this, md);
 
     // Check and print out residual
-    auto get_res = DotProduct<s, s>(correct_s, region, itl, i, reg_dep_id, &residual, md);
+    auto get_res = DotProduct<s, s>(correct_s, itl, &residual, md);
 
     auto print = itl.AddTask(
-        get_res,
-        [&](BiCGSTABSolver *solver, Mesh *pmesh, int partition) {
-          if (partition != 0) return TaskStatus::complete;
+        TaskQualifier::once_per_region, get_res,
+        [&](BiCGSTABSolver *solver, Mesh *pmesh) {
           Real rms_res = std::sqrt(solver->residual.val / pmesh->GetTotalCells());
           if (Globals::my_rank == 0)
             printf("%i %e\n", solver->iter_counter * 2 + 1, rms_res);
           return TaskStatus::complete;
         },
-        this, pmesh, i);
+        this, pmesh);
 
     // 6. u <- M s
     auto precon2 = correct_s;
     if (params_.precondition) {
       auto set_rhs = itl.AddTask(precon2, CopyData<s, rhs>, md);
       auto zero_u = itl.AddTask(precon2, SetToZero<u>, md);
-      precon2 = preconditioner.AddLinearOperatorTasks(region, itl, set_rhs | zero_u, i,
-                                                      reg_dep_id, pmesh);
+      precon2 =
+          preconditioner.AddLinearOperatorTasks(itl, set_rhs | zero_u, partition, pmesh);
     } else {
       precon2 = itl.AddTask(precon2, CopyData<s, u>, md);
     }
@@ -185,12 +179,12 @@ class BiCGSTABSolver {
     auto get_t = eqs_.template Ax<u, t>(itl, pre_t_comm, md);
 
     // 8. omega <- (t,s) / (t,t)
-    auto get_ts = DotProduct<t, s>(get_t, region, itl, i, reg_dep_id, &ts, md);
-    auto get_tt = DotProduct<t, t>(get_t, region, itl, i, reg_dep_id, &tt, md);
+    auto get_ts = DotProduct<t, s>(get_t, itl, &ts, md);
+    auto get_tt = DotProduct<t, t>(get_t, itl, &tt, md);
 
     // 9. x <- h + omega u
     auto correct_x = itl.AddTask(
-        get_tt | get_ts,
+        TaskQualifier::local_sync, get_tt | get_ts,
         [](BiCGSTABSolver *solver, std::shared_ptr<MeshData<Real>> &md) {
           Real omega = solver->ts.val / solver->tt.val;
           return AddFieldsAndStore<h, u, x>(md, 1.0, omega);
@@ -207,29 +201,25 @@ class BiCGSTABSolver {
         this, md);
 
     // Check and print out residual
-    auto get_res2 =
-        DotProduct<r, r>(correct_r, region, itl, i, reg_dep_id, &residual, md);
-
-    if (i == 0) {
-      get_res2 = itl.AddTask(
-          get_res2,
-          [&](BiCGSTABSolver *solver, Mesh *pmesh) {
-            Real rms_err = std::sqrt(solver->residual.val / pmesh->GetTotalCells());
-            if (Globals::my_rank == 0)
-              printf("%i %e\n", solver->iter_counter * 2 + 2, rms_err);
-            return TaskStatus::complete;
-          },
-          this, pmesh);
-    }
+    auto get_res2 = DotProduct<r, r>(correct_r, itl, &residual, md);
+
+    get_res2 = itl.AddTask(
+        TaskQualifier::once_per_region, get_res2,
+        [&](BiCGSTABSolver *solver, Mesh *pmesh) {
+          Real rms_err = std::sqrt(solver->residual.val / pmesh->GetTotalCells());
+          if (Globals::my_rank == 0)
+            printf("%i %e\n", solver->iter_counter * 2 + 2, rms_err);
+          return TaskStatus::complete;
+        },
+        this, pmesh);
 
     // 11. rhat0r <- (rhat0, r)
-    auto get_rhat0r =
-        DotProduct<rhat0, r>(correct_r, region, itl, i, reg_dep_id, &rhat0r, md);
+    auto get_rhat0r = DotProduct<rhat0, r>(correct_r, itl, &rhat0r, md);
 
     // 12. beta <- rhat0r / rhat0r_old * alpha / omega
     // 13. p <- r + beta * (p - omega * v)
     auto update_p = itl.AddTask(
-        get_rhat0r | get_res2,
+        TaskQualifier::local_sync, get_rhat0r | get_res2,
         [](BiCGSTABSolver *solver, std::shared_ptr<MeshData<Real>> &md) {
           Real alpha = solver->rhat0r_old / solver->rhat0v.val;
           Real omega = solver->ts.val / solver->tt.val;
@@ -241,17 +231,16 @@ class BiCGSTABSolver {
         this, md);
 
     // 14. rhat0r_old <- rhat0r, zero all reductions
-    region.AddRegionalDependencies(reg_dep_id, i, update_p | correct_x);
-    auto check = itl.SetCompletionTask(
+    auto check = itl.AddTask(
+        TaskQualifier::completion | TaskQualifier::once_per_region |
+            TaskQualifier::global_sync,
         update_p | correct_x,
-        [](BiCGSTABSolver *solver, Mesh *pmesh, int partition, int max_iter,
-           Real res_tol) {
-          if (partition != 0) return TaskStatus::complete;
+        [](BiCGSTABSolver *solver, Mesh *pmesh, Real res_tol) {
           solver->iter_counter++;
           Real rms_res = std::sqrt(solver->residual.val / pmesh->GetTotalCells());
-          if (rms_res < res_tol || solver->iter_counter >= max_iter) {
-            solver->final_residual = rms_res;
-            solver->final_iteration = solver->iter_counter;
+          solver->final_residual = rms_res;
+          solver->final_iteration = solver->iter_counter;
+          if (rms_res < res_tol) {
             return TaskStatus::complete;
           }
           solver->rhat0r_old = solver->rhat0r.val;
@@ -262,11 +251,9 @@ class BiCGSTABSolver {
           solver->residual.val = 0.0;
           return TaskStatus::iterate;
         },
-        this, pmesh, i, params_.max_iters, params_.residual_tolerance);
-    region.AddGlobalDependencies(reg_dep_id, i, check);
-    reg_dep_id++;
+        this, pmesh, params_.residual_tolerance);
 
-    return check;
+    return solver_id;
   }
 
   Real GetSquaredResidualSum() const { return residual.val; }
diff --git a/src/solvers/mg_solver.hpp b/src/solvers/mg_solver.hpp
index 074663983099..427b1802e7e2 100644
--- a/src/solvers/mg_solver.hpp
+++ b/src/solvers/mg_solver.hpp
@@ -23,8 +23,8 @@
 #include "interface/state_descriptor.hpp"
 #include "kokkos_abstraction.hpp"
 #include "solvers/solver_utils.hpp"
-#include "tasks/task_id.hpp"
-#include "tasks/task_list.hpp"
+
+#include "tasks/tasks.hpp"
 
 namespace parthenon {
 
@@ -82,61 +82,55 @@ class MGSolver {
     pkg->AddField(D::name(), mu0);
   }
 
-  TaskID AddTasks(TaskList & /*tl*/, IterativeTasks &itl, TaskID dependence,
-                  int partition, Mesh *pmesh, TaskRegion &region, int &reg_dep_id) {
+  TaskID AddTasks(TaskList &tl, TaskID dependence, Mesh *pmesh, const int partition) {
     using namespace utils;
+    TaskID none;
+    auto [itl, solve_id] = tl.AddSublist(dependence, {1, this->params_.max_iters});
     iter_counter = 0;
     itl.AddTask(
-        dependence,
-        [](int partition, int *iter_counter) {
-          if (partition != 0 || *iter_counter > 0 || Globals::my_rank != 0)
-            return TaskStatus::complete;
+        TaskQualifier::once_per_region, none,
+        [](int *iter_counter) {
+          if (*iter_counter > 0 || Globals::my_rank != 0) return TaskStatus::complete;
           printf("# [0] v-cycle\n# [1] rms-residual\n# [2] rms-error\n");
           return TaskStatus::complete;
         },
-        partition, &iter_counter);
-    auto mg_finest =
-        AddLinearOperatorTasks(region, itl, dependence, partition, reg_dep_id, pmesh);
+        &iter_counter);
+    auto mg_finest = AddLinearOperatorTasks(itl, none, partition, pmesh);
     auto &md = pmesh->mesh_data.GetOrAdd("base", partition);
     auto calc_pointwise_res = eqs_.template Ax<u, res_err>(itl, mg_finest, md);
     calc_pointwise_res = itl.AddTask(
         calc_pointwise_res, AddFieldsAndStoreInteriorSelect<rhs, res_err, res_err>, md,
         1.0, -1.0, false);
-    auto get_res = DotProduct<res_err, res_err>(calc_pointwise_res, region, itl,
-                                                partition, reg_dep_id, &residual, md);
+    auto get_res = DotProduct<res_err, res_err>(calc_pointwise_res, itl, &residual, md);
 
-    auto check = itl.SetCompletionTask(
+    auto check = itl.AddTask(
+        TaskQualifier::once_per_region | TaskQualifier::completion |
+            TaskQualifier::global_sync,
         get_res,
-        [](MGSolver *solver, int part, Mesh *pmesh) {
-          if (part != 0) return TaskStatus::complete;
+        [](MGSolver *solver, Mesh *pmesh) {
           solver->iter_counter++;
           Real rms_res = std::sqrt(solver->residual.val / pmesh->GetTotalCells());
           if (Globals::my_rank == 0) printf("%i %e\n", solver->iter_counter, rms_res);
-          if (rms_res > solver->params_.residual_tolerance &&
-              solver->iter_counter < solver->params_.max_iters)
-            return TaskStatus::iterate;
           solver->final_residual = rms_res;
           solver->final_iteration = solver->iter_counter;
+          if (rms_res > solver->params_.residual_tolerance) return TaskStatus::iterate;
           return TaskStatus::complete;
         },
-        this, partition, pmesh);
-    region.AddGlobalDependencies(reg_dep_id, partition, check);
-    reg_dep_id++;
+        this, pmesh);
 
-    return check;
+    return solve_id;
   }
 
-  template <class TL_t>
-  TaskID AddLinearOperatorTasks(TaskRegion &region, TL_t &tl, TaskID dependence,
-                                int partition, int &reg_dep_id, Mesh *pmesh) {
+  TaskID AddLinearOperatorTasks(TaskList &tl, TaskID dependence, int partition,
+                                Mesh *pmesh) {
     using namespace utils;
     iter_counter = 0;
 
     int min_level = 0;
     int max_level = pmesh->GetGMGMaxLevel();
 
-    return AddMultiGridTasksPartitionLevel(region, tl, dependence, partition, reg_dep_id,
-                                           max_level, min_level, max_level, pmesh);
+    return AddMultiGridTasksPartitionLevel(tl, dependence, partition, max_level,
+                                           min_level, max_level, pmesh);
   }
 
   Real GetSquaredResidualSum() const { return residual.val; }
@@ -244,10 +238,9 @@ class MGSolver {
     return tl.AddTask(jacobi3, CopyData<temp, u, true>, md);
   }
 
-  template <class TL_t>
-  TaskID AddMultiGridTasksPartitionLevel(TaskRegion &region, TL_t &tl, TaskID dependence,
-                                         int partition, int &reg_dep_id, int level,
-                                         int min_level, int max_level, Mesh *pmesh) {
+  TaskID AddMultiGridTasksPartitionLevel(TaskList &tl, TaskID dependence, int partition,
+                                         int level, int min_level, int max_level,
+                                         Mesh *pmesh) {
     using namespace utils;
     auto smoother = params_.smoother;
     bool do_FAS = params_.do_FAS;
@@ -278,10 +271,8 @@ class MGSolver {
       // Fill fields with restricted values
       auto recv_from_finer =
           tl.AddTask(dependence, ReceiveBoundBufs<BoundaryType::gmg_restrict_recv>, md);
-      set_from_finer =
-          tl.AddTask(recv_from_finer, SetBounds<BoundaryType::gmg_restrict_recv>, md);
-      region.AddRegionalDependencies(reg_dep_id, partition, set_from_finer);
-      reg_dep_id++;
+      set_from_finer = tl.AddTask( // TaskQualifier::local_sync, // is this required?
+          recv_from_finer, SetBounds<BoundaryType::gmg_restrict_recv>, md);
       // 1. Copy residual from dual purpose communication field to the rhs, should be
       // actual RHS for finest level
       auto copy_u = tl.AddTask(set_from_finer, CopyData<u, u0, true>, md);
@@ -327,19 +318,16 @@ class MGSolver {
       auto communicate_to_coarse =
           tl.AddTask(residual, SendBoundBufs<BoundaryType::gmg_restrict_send>, md);
 
-      auto coarser = AddMultiGridTasksPartitionLevel(region, tl, communicate_to_coarse,
-                                                     partition, reg_dep_id, level - 1,
-                                                     min_level, max_level, pmesh);
+      auto coarser = AddMultiGridTasksPartitionLevel(
+          tl, communicate_to_coarse, partition, level - 1, min_level, max_level, pmesh);
 
       // 6. Receive error field into communication field and prolongate
       auto recv_from_coarser =
           tl.AddTask(coarser, ReceiveBoundBufs<BoundaryType::gmg_prolongate_recv>, md);
       auto set_from_coarser =
           tl.AddTask(recv_from_coarser, SetBounds<BoundaryType::gmg_prolongate_recv>, md);
-      auto prolongate = tl.AddTask(
+      auto prolongate = tl.AddTask( // TaskQualifier::local_sync, // is this required?
           set_from_coarser, ProlongateBounds<BoundaryType::gmg_prolongate_recv>, md);
-      region.AddRegionalDependencies(reg_dep_id, partition, prolongate);
-      reg_dep_id++;
 
       // 7. Correct solution on this level with res_err field and store in
       //    communication field
diff --git a/src/solvers/solver_utils.hpp b/src/solvers/solver_utils.hpp
index 38ab9cf17889..76f77ec7a298 100644
--- a/src/solvers/solver_utils.hpp
+++ b/src/solvers/solver_utils.hpp
@@ -272,32 +272,24 @@ TaskStatus DotProductLocal(const std::shared_ptr<MeshData<Real>> &md,
   return TaskStatus::complete;
 }
 
-template <class a_t, class b_t, class TL_t>
-TaskID DotProduct(TaskID dependency_in, TaskRegion &region, TL_t &tl, int partition,
-                  int &reg_dep_id, AllReduce<Real> *adotb,
+template <class a_t, class b_t>
+TaskID DotProduct(TaskID dependency_in, TaskList &tl, AllReduce<Real> *adotb,
                   const std::shared_ptr<MeshData<Real>> &md) {
   using namespace impl;
-  auto zero_adotb = (partition == 0 ? tl.AddTask(
-                                          dependency_in,
-                                          [](AllReduce<Real> *r) {
-                                            r->val = 0.0;
-                                            return TaskStatus::complete;
-                                          },
-                                          adotb)
-                                    : dependency_in);
-  region.AddRegionalDependencies(reg_dep_id, partition, zero_adotb);
-  reg_dep_id++;
-  auto get_adotb = tl.AddTask(zero_adotb, DotProductLocal<a_t, b_t>, md, adotb);
-  region.AddRegionalDependencies(reg_dep_id, partition, get_adotb);
-  reg_dep_id++;
-  auto start_global_adotb =
-      (partition == 0
-           ? tl.AddTask(get_adotb, &AllReduce<Real>::StartReduce, adotb, MPI_SUM)
-           : get_adotb);
+  auto zero_adotb = tl.AddTask(
+      TaskQualifier::once_per_region | TaskQualifier::local_sync, dependency_in,
+      [](AllReduce<Real> *r) {
+        r->val = 0.0;
+        return TaskStatus::complete;
+      },
+      adotb);
+  auto get_adotb = tl.AddTask(TaskQualifier::local_sync, zero_adotb,
+                              DotProductLocal<a_t, b_t>, md, adotb);
+  auto start_global_adotb = tl.AddTask(TaskQualifier::once_per_region, get_adotb,
+                                       &AllReduce<Real>::StartReduce, adotb, MPI_SUM);
   auto finish_global_adotb =
-      tl.AddTask(start_global_adotb, &AllReduce<Real>::CheckReduce, adotb);
-  region.AddRegionalDependencies(reg_dep_id, partition, finish_global_adotb);
-  reg_dep_id++;
+      tl.AddTask(TaskQualifier::once_per_region | TaskQualifier::local_sync,
+                 start_global_adotb, &AllReduce<Real>::CheckReduce, adotb);
   return finish_global_adotb;
 }
 
diff --git a/src/tasks/task_id.cpp b/src/tasks/task_id.cpp
deleted file mode 100644
index 70bacbea8f36..000000000000
--- a/src/tasks/task_id.cpp
+++ /dev/null
@@ -1,157 +0,0 @@
-//========================================================================================
-// Athena++ astrophysical MHD code
-// Copyright(C) 2014 James M. Stone <jmstone@princeton.edu> and other code contributors
-// Licensed under the 3-clause BSD License, see LICENSE file for details
-//========================================================================================
-// (C) (or copyright) 2020. Triad National Security, LLC. All rights reserved.
-//
-// This program was produced under U.S. Government contract 89233218CNA000001 for Los
-// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
-// for the U.S. Department of Energy/National Nuclear Security Administration. All rights
-// in the program are reserved by Triad National Security, LLC, and the U.S. Department
-// of Energy/National Nuclear Security Administration. The Government is granted for
-// itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
-// license in this material to reproduce, prepare derivative works, distribute copies to
-// the public, perform publicly and display publicly, and to permit others to do so.
-//========================================================================================
-//! \file tasks.cpp
-//  \brief implementation of the TaskID class
-
-#include "tasks/task_id.hpp"
-
-#include <algorithm>
-#include <bitset>
-#include <stdexcept>
-#include <string>
-#include <utility>
-
-namespace parthenon {
-
-// TaskID constructor. Default id = 0.
-TaskID::TaskID(int id) { Set(id); }
-
-void TaskID::Set(int id) {
-  if (id < 0) throw std::invalid_argument("TaskID requires integer arguments >= 0");
-  if (id == 0) {
-    bitblocks.resize(1);
-    return;
-  }
-  id--;
-  const int n_myblocks = id / BITBLOCK + 1;
-  // grow if necessary.  never shrink
-  if (n_myblocks > bitblocks.size()) bitblocks.resize(n_myblocks);
-  bitblocks[n_myblocks - 1].set(id % BITBLOCK);
-}
-
-void TaskID::clear() {
-  for (auto &bset : bitblocks) {
-    bset.reset();
-  }
-}
-
-bool TaskID::CheckDependencies(const TaskID &rhs) const {
-  const int n_myblocks = bitblocks.size();
-  const int n_srcblocks = rhs.bitblocks.size();
-  if (n_myblocks == n_srcblocks) {
-    for (int i = 0; i < n_myblocks; i++) {
-      if ((bitblocks[i] & rhs.bitblocks[i]) != rhs.bitblocks[i]) return false;
-    }
-  } else if (n_myblocks > n_srcblocks) {
-    for (int i = 0; i < n_srcblocks; i++) {
-      if ((bitblocks[i] & rhs.bitblocks[i]) != rhs.bitblocks[i]) return false;
-    }
-  } else {
-    for (int i = 0; i < n_myblocks; i++) {
-      if ((bitblocks[i] & rhs.bitblocks[i]) != rhs.bitblocks[i]) return false;
-    }
-    for (int i = n_myblocks; i < n_srcblocks; i++) {
-      if (rhs.bitblocks[i].any()) return false;
-    }
-  }
-  return true;
-}
-
-void TaskID::SetFinished(const TaskID &rhs) {
-  const int n_myblocks = bitblocks.size();
-  const int n_srcblocks = rhs.bitblocks.size();
-  if (n_myblocks == n_srcblocks) {
-    for (int i = 0; i < n_myblocks; i++) {
-      bitblocks[i] ^= rhs.bitblocks[i];
-    }
-  } else if (n_myblocks > n_srcblocks) {
-    for (int i = 0; i < n_srcblocks; i++) {
-      bitblocks[i] ^= rhs.bitblocks[i];
-    }
-  } else {
-    for (int i = 0; i < n_myblocks; i++) {
-      bitblocks[i] ^= rhs.bitblocks[i];
-    }
-    for (int i = n_myblocks; i < n_srcblocks; i++) {
-      bitblocks.push_back(rhs.bitblocks[i]);
-    }
-  }
-}
-
-bool TaskID::operator==(const TaskID &rhs) const {
-  const int n_myblocks = bitblocks.size();
-  const int n_srcblocks = rhs.bitblocks.size();
-  if (n_myblocks == n_srcblocks) {
-    for (int i = 0; i < n_myblocks; i++) {
-      if (bitblocks[i] != rhs.bitblocks[i]) return false;
-    }
-  } else if (n_myblocks > n_srcblocks) {
-    for (int i = 0; i < n_srcblocks; i++) {
-      if (bitblocks[i] != rhs.bitblocks[i]) return false;
-    }
-    for (int i = n_srcblocks; i < n_myblocks; i++) {
-      if (bitblocks[i].any()) return false;
-    }
-  } else {
-    for (int i = 0; i < n_myblocks; i++) {
-      if (bitblocks[i] != rhs.bitblocks[i]) return false;
-    }
-    for (int i = n_myblocks; i < n_srcblocks; i++) {
-      if (rhs.bitblocks[i].any()) return false;
-    }
-  }
-  return true;
-}
-
-bool TaskID::operator!=(const TaskID &rhs) const { return !operator==(rhs); }
-
-TaskID TaskID::operator|(const TaskID &rhs) const {
-  TaskID res;
-  const int n_myblocks = bitblocks.size();
-  const int n_srcblocks = rhs.bitblocks.size();
-  res.bitblocks.resize(std::max(n_myblocks, n_srcblocks));
-  if (n_myblocks == n_srcblocks) {
-    for (int i = 0; i < n_myblocks; i++) {
-      res.bitblocks[i] = bitblocks[i] | rhs.bitblocks[i];
-    }
-  } else if (n_myblocks > n_srcblocks) {
-    for (int i = 0; i < n_srcblocks; i++) {
-      res.bitblocks[i] = bitblocks[i] | rhs.bitblocks[i];
-    }
-    for (int i = n_srcblocks; i < n_myblocks; i++) {
-      res.bitblocks[i] = bitblocks[i];
-    }
-  } else {
-    for (int i = 0; i < n_myblocks; i++) {
-      res.bitblocks[i] = bitblocks[i] | rhs.bitblocks[i];
-    }
-    for (int i = n_myblocks; i < n_srcblocks; i++) {
-      res.bitblocks[i] = rhs.bitblocks[i];
-    }
-  }
-  return res;
-}
-
-std::string TaskID::to_string() const {
-  std::string bs;
-  for (int i = bitblocks.size() - 1; i >= 0; i--) {
-    bs += bitblocks[i].to_string();
-  }
-  return bs;
-}
-
-} // namespace parthenon
diff --git a/src/tasks/task_id.hpp b/src/tasks/task_id.hpp
deleted file mode 100644
index 54043001af66..000000000000
--- a/src/tasks/task_id.hpp
+++ /dev/null
@@ -1,51 +0,0 @@
-//========================================================================================
-// (C) (or copyright) 2020. Triad National Security, LLC. All rights reserved.
-//
-// This program was produced under U.S. Government contract 89233218CNA000001 for Los
-// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
-// for the U.S. Department of Energy/National Nuclear Security Administration. All rights
-// in the program are reserved by Triad National Security, LLC, and the U.S. Department
-// of Energy/National Nuclear Security Administration. The Government is granted for
-// itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
-// license in this material to reproduce, prepare derivative works, distribute copies to
-// the public, perform publicly and display publicly, and to permit others to do so.
-//========================================================================================
-
-#ifndef TASKS_TASK_ID_HPP_
-#define TASKS_TASK_ID_HPP_
-
-#include <bitset>
-#include <string>
-#include <vector>
-
-#include "basic_types.hpp"
-
-namespace parthenon {
-
-//----------------------------------------------------------------------------------------
-//! \class TaskID
-//  \brief generalization of bit fields for Task IDs, status, and dependencies.
-
-#define BITBLOCK 16
-
-class TaskID {
- public:
-  TaskID() { Set(0); }
-  explicit TaskID(int id);
-
-  void Set(int id);
-  void clear();
-  bool CheckDependencies(const TaskID &rhs) const;
-  void SetFinished(const TaskID &rhs);
-  bool operator==(const TaskID &rhs) const;
-  bool operator!=(const TaskID &rhs) const;
-  TaskID operator|(const TaskID &rhs) const;
-  std::string to_string() const;
-
- private:
-  std::vector<std::bitset<BITBLOCK>> bitblocks;
-};
-
-} // namespace parthenon
-
-#endif // TASKS_TASK_ID_HPP_
diff --git a/src/tasks/task_list.hpp b/src/tasks/task_list.hpp
deleted file mode 100644
index 322d1a788d70..000000000000
--- a/src/tasks/task_list.hpp
+++ /dev/null
@@ -1,538 +0,0 @@
-//========================================================================================
-// (C) (or copyright) 2023. Triad National Security, LLC. All rights reserved.
-//
-// This program was produced under U.S. Government contract 89233218CNA000001 for Los
-// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
-// for the U.S. Department of Energy/National Nuclear Security Administration. All rights
-// in the program are reserved by Triad National Security, LLC, and the U.S. Department
-// of Energy/National Nuclear Security Administration. The Government is granted for
-// itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
-// license in this material to reproduce, prepare derivative works, distribute copies to
-// the public, perform publicly and display publicly, and to permit others to do so.
-//========================================================================================
-
-#ifndef TASKS_TASK_LIST_HPP_
-#define TASKS_TASK_LIST_HPP_
-
-#include <bitset>
-#include <iostream>
-#include <limits>
-#include <list>
-#include <map>
-#include <memory>
-#include <set>
-#include <stdexcept>
-#include <string>
-#include <tuple>
-#include <unordered_map>
-#include <utility>
-#include <vector>
-
-#include "basic_types.hpp"
-#include "task_id.hpp"
-#include "task_types.hpp"
-#include "utils/error_checking.hpp"
-#include "utils/reductions.hpp"
-
-namespace parthenon {
-
-enum class TaskListStatus { running, stuck, complete, nothing_to_do };
-
-class TaskList;
-namespace task_list_impl {
-TaskID AddTaskHelper(TaskList *, Task);
-} // namespace task_list_impl
-
-class IterativeTasks {
- public:
-  IterativeTasks() = default;
-  IterativeTasks(TaskList *tl, int key) : tl_(tl), key_(key) {
-    max_iterations_ = std::numeric_limits<int>::max();
-  }
-
-  // overload to add member functions of class T to task list
-  // NOTE: we must capture the object pointer
-  template <class T, class U, class... Args1, class... Args2>
-  TaskID AddTask(TaskID const &dep, TaskStatus (T::*func)(Args1...), U *obj,
-                 Args2 &&...args) {
-    return this->AddTask_(TaskType::iterative, 1, dep, [=]() mutable -> TaskStatus {
-      return (obj->*func)(std::forward<Args2>(args)...);
-    });
-  }
-
-  template <class T, class... Args>
-  TaskID AddTask(TaskID const &dep, T &&func, Args &&...args) {
-    return AddTask_(TaskType::iterative, 1, dep, std::forward<T>(func),
-                    std::forward<Args>(args)...);
-  }
-
-  template <class T, class U, class... Args>
-  TaskID SetCompletionTask(TaskID const &dep, TaskStatus (T::*func)(Args...), U *obj,
-                           Args &&...args) {
-    return AddTask_(TaskType::completion_criteria, check_interval_, dep,
-                    [=]() mutable -> TaskStatus {
-                      return (obj->*func)(std::forward<Args>(args)...);
-                    });
-  }
-
-  template <class T, class... Args>
-  TaskID SetCompletionTask(TaskID const &dep, T &&func, Args &&...args) {
-    return AddTask_(TaskType::completion_criteria, check_interval_, dep,
-                    std::forward<T>(func), std::forward<Args>(args)...);
-  }
-
-  void SetMaxIterations(const int max) {
-    assert(max > 0);
-    max_iterations_ = max;
-  }
-  void SetCheckInterval(const int chk) {
-    assert(chk > 0);
-    check_interval_ = chk;
-  }
-  void SetFailWithMaxIterations(const bool flag) { throw_with_max_iters_ = flag; }
-  void SetWarnWithMaxIterations(const bool flag) { warn_with_max_iters_ = flag; }
-  bool ShouldThrowWithMax() const { return throw_with_max_iters_; }
-  bool ShouldWarnWithMax() const { return warn_with_max_iters_; }
-  int GetMaxIterations() const { return max_iterations_; }
-  int GetIterationCount() const { return count_; }
-  void IncrementCount() { count_++; }
-  void ResetCount() { count_ = 0; }
-  void PrintList() { std::cout << "tl_ = " << tl_ << std::endl; }
-
- private:
-  template <class F, class... Args>
-  TaskID AddTask_(const TaskType &type, const int interval, TaskID const &dep, F &&func,
-                  Args &&...args) {
-    TaskID id(0);
-    id = task_list_impl::AddTaskHelper(
-        tl_, Task(
-                 id, dep,
-                 [=, func = std::forward<F>(func)]() mutable -> TaskStatus {
-                   return func(std::forward<Args>(args)...);
-                 },
-                 type, key_));
-    return id;
-  }
-  TaskList *tl_;
-  int key_;
-  int max_iterations_;
-  unsigned int count_ = 0;
-  int check_interval_ = 1;
-  bool throw_with_max_iters_ = false;
-  bool warn_with_max_iters_ = true;
-};
-
-class TaskList {
- public:
-  TaskList() = default;
-  bool IsComplete() { return task_list_.empty(); }
-  int Size() { return task_list_.size(); }
-  void MarkRegional(const TaskID &id) {
-    for (auto &task : task_list_) {
-      if (task.GetID() == id) {
-        task.SetRegional();
-        break;
-      }
-    }
-  }
-  void MarkTaskComplete(const TaskID &id) { tasks_completed_.SetFinished(id); }
-  bool CheckDependencies(const TaskID &id) const {
-    return tasks_completed_.CheckDependencies(id);
-  }
-  bool CheckTaskRan(TaskID id) const {
-    for (auto &task : task_list_) {
-      if (task.GetID() == id) {
-        return (task.GetStatus() != TaskStatus::incomplete &&
-                task.GetStatus() != TaskStatus::skip &&
-                task.GetStatus() != TaskStatus::waiting);
-      }
-    }
-    return false;
-  }
-  bool CheckStatus(const TaskID &id, TaskStatus status) const {
-    for (auto &task : task_list_) {
-      if (task.GetID() == id) return (task.GetStatus() == status);
-    }
-    return true;
-  }
-  bool CheckTaskCompletion(const TaskID &id) const {
-    return CheckStatus(id, TaskStatus::complete);
-  }
-  void ClearComplete() {
-    auto task = task_list_.begin();
-    while (task != task_list_.end()) {
-      if (task->GetStatus() == TaskStatus::complete &&
-          task->GetType() != TaskType::iterative &&
-          task->GetType() != TaskType::completion_criteria && !task->IsRegional()) {
-        task = task_list_.erase(task);
-      } else {
-        ++task;
-      }
-    }
-    std::set<int> completed_iters;
-    for (auto &tsk : task_list_) {
-      if (tsk.GetType() == TaskType::completion_criteria &&
-          tsk.GetStatus() == TaskStatus::complete && !tsk.IsRegional()) {
-        completed_iters.insert(tsk.GetKey());
-      }
-    }
-    for (const auto &key : completed_iters) {
-      ClearIteration(key);
-    }
-  }
-  void ClearIteration(const int key) {
-    auto task = task_list_.begin();
-    while (task != task_list_.end()) {
-      if (task->GetKey() == key) {
-        task = task_list_.erase(task);
-      } else {
-        ++task;
-      }
-    }
-    iter_tasks[key].ResetCount();
-  }
-  void ResetIteration(const int key) {
-    PARTHENON_REQUIRE_THROWS(key < iter_tasks.size(), "Invalid iteration key");
-    iter_tasks[key].IncrementCount();
-    if (iter_tasks[key].GetIterationCount() == iter_tasks[key].GetMaxIterations()) {
-      if (iter_tasks[key].ShouldThrowWithMax()) {
-        PARTHENON_THROW("Iteration " + iter_labels[key] +
-                        " reached maximum allowed cycles without convergence.");
-      }
-      if (iter_tasks[key].ShouldWarnWithMax()) {
-        PARTHENON_WARN("Iteration " + iter_labels[key] +
-                       " reached maximum allowed cycles without convergence.");
-      }
-      for (auto &task : task_list_) {
-        if (task.GetKey() == key && task.GetType() == TaskType::completion_criteria) {
-          MarkTaskComplete(task.GetID());
-        }
-      }
-      ClearIteration(key);
-      return;
-    }
-    for (auto &task : task_list_) {
-      if (task.GetKey() == key) {
-        if (CheckDependencies(task.GetID())) {
-          MarkTaskComplete(task.GetID());
-        }
-        task.SetStatus(TaskStatus::incomplete);
-      }
-    }
-  }
-  void ResetIfNeeded(const TaskID &id) {
-    for (auto &task : task_list_) {
-      if (task.GetID() == id) {
-        if (task.GetType() == TaskType::completion_criteria) {
-          ResetIteration(task.GetKey());
-        }
-        break;
-      }
-    }
-  }
-  bool CompleteIfNeeded(const TaskID &id) {
-    MarkTaskComplete(id);
-    auto task = task_list_.begin();
-    while (task != task_list_.end()) {
-      if (task->GetID() == id) {
-        if (task->GetType() == TaskType::completion_criteria) {
-          ClearIteration(task->GetKey());
-          return true;
-        } else if (task->GetType() == TaskType::single) {
-          task = task_list_.erase(task);
-        } else {
-          task->SetStatus(TaskStatus::waiting);
-        }
-        break;
-      } else {
-        ++task;
-      }
-    }
-    return false;
-  }
-  void DoAvailable() {
-    auto task = task_list_.begin();
-    while (task != task_list_.end()) {
-      // first skip task if it's complete.  Possible for iterative tasks
-      if (task->GetStatus() != TaskStatus::incomplete) {
-        ++task;
-        continue;
-      }
-      auto dep = task->GetDependency();
-      if (CheckDependencies(dep)) {
-        (*task)();
-        if (task->GetStatus() == TaskStatus::complete && !task->IsRegional()) {
-          MarkTaskComplete(task->GetID());
-        } else if (task->GetStatus() == TaskStatus::skip &&
-                   task->GetType() == TaskType::completion_criteria) {
-          ResetIteration(task->GetKey());
-        } else if (task->GetStatus() == TaskStatus::iterate && !task->IsRegional()) {
-          ResetIteration(task->GetKey());
-        }
-      }
-      ++task;
-    }
-    ClearComplete();
-  }
-  bool Validate() const {
-    std::set<int> iters;
-    for (auto &task : task_list_) {
-      if (task.GetType() == TaskType::iterative) iters.insert(task.GetKey());
-    }
-    int num_iters = iters.size();
-    int found = 0;
-    for (auto &iter : iters) {
-      for (auto &task : task_list_) {
-        if (task.GetType() == TaskType::completion_criteria && task.GetKey() == iter) {
-          found++;
-          break;
-        }
-      }
-    }
-    bool valid = (found == num_iters);
-    PARTHENON_REQUIRE_THROWS(
-        valid,
-        "Task list validation found iterative tasks without a completion criteria");
-    return valid;
-  }
-
-  TaskID AddTask(Task &tsk) {
-    TaskID id(tasks_added_ + 1);
-    tsk.SetID(id);
-    task_list_.push_back(std::move(tsk));
-    tasks_added_++;
-    return id;
-  }
-
-  // overload to add member functions of class T to task list
-  // NOTE: we must capture the object pointer
-  template <class T, class U, class... Args1, class... Args2>
-  TaskID AddTask(TaskID const &dep, TaskStatus (T::*func)(Args1...), U *obj,
-                 Args2 &&...args) {
-    return this->AddTask(dep, [=]() mutable -> TaskStatus {
-      return (obj->*func)(std::forward<Args2>(args)...);
-    });
-  }
-
-  template <class F, class... Args>
-  TaskID AddTask(TaskID const &dep, F &&func, Args &&...args) {
-    TaskID id(tasks_added_ + 1);
-    task_list_.push_back(
-        Task(id, dep, [=, func = std::forward<F>(func)]() mutable -> TaskStatus {
-          return func(std::forward<Args>(args)...);
-        }));
-    tasks_added_++;
-    return id;
-  }
-
-  IterativeTasks &AddIteration(const std::string &label) {
-    int key = iter_tasks.size();
-    iter_tasks[key] = IterativeTasks(this, key);
-    iter_labels[key] = label;
-    return iter_tasks[key];
-  }
-
-  void Print() {
-    int i = 0;
-    std::cout << "TaskList::Print():" << std::endl;
-    for (auto &t : task_list_) {
-      std::cout << "  " << i << "  " << t.GetID().to_string() << "  "
-                << t.GetDependency().to_string() << " " << tasks_completed_.to_string()
-                << " " << (t.GetStatus() == TaskStatus::incomplete)
-                << (t.GetStatus() == TaskStatus::complete)
-                << (t.GetStatus() == TaskStatus::skip)
-                << (t.GetStatus() == TaskStatus::iterate)
-                << (t.GetStatus() == TaskStatus::fail) << std::endl;
-
-      i++;
-    }
-  }
-
- protected:
-  std::map<int, IterativeTasks> iter_tasks;
-  std::map<int, std::string> iter_labels;
-  std::list<Task> task_list_;
-  int tasks_added_ = 0;
-  TaskID tasks_completed_;
-};
-
-namespace task_list_impl {
-// helper function to avoid having to call a member function of TaskList from
-// IterativeTasks before TaskList has been defined
-inline TaskID AddTaskHelper(TaskList *tl, Task tsk) { return tl->AddTask(tsk); }
-} // namespace task_list_impl
-
-class RegionCounter {
- public:
-  explicit RegionCounter(const std::string &base) : base_(base), cnt_(0) {}
-  std::string ID() { return base_ + std::to_string(cnt_++); }
-
- private:
-  const std::string base_;
-  int cnt_;
-};
-
-class TaskRegion {
- public:
-  explicit TaskRegion(const int size) : lists(size) {}
-  void AddRegionalDependencies(const int reg_dep_id, const int list_index,
-                               const TaskID &id) {
-    AddRegionalDependencies(std::to_string(reg_dep_id), list_index, id);
-  }
-  void AddRegionalDependencies(const std::string &reg_dep_id, const int list_index,
-                               const TaskID &id) {
-    AddDependencies(reg_dep_id, list_index, id);
-    global[reg_dep_id] = false;
-  }
-  void AddGlobalDependencies(const int reg_dep_id, const int list_index,
-                             const TaskID &id) {
-    AddGlobalDependencies(std::to_string(reg_dep_id), list_index, id);
-  }
-  void AddGlobalDependencies(const std::string &reg_dep_id, const int list_index,
-                             const TaskID &id) {
-    AddDependencies(reg_dep_id, list_index, id);
-    global[reg_dep_id] = true;
-  }
-
-  TaskList &operator[](int i) { return lists[i]; }
-
-  int size() const { return lists.size(); }
-
-  bool Execute() {
-    for (auto i = 0; i < lists.size(); ++i) {
-      if (!lists[i].IsComplete()) {
-        lists[i].DoAvailable();
-      }
-    }
-    return CheckAndUpdate();
-  }
-
-  bool CheckAndUpdate() {
-    auto it = id_for_reg.begin();
-    while (it != id_for_reg.end()) {
-      auto &reg_id = it->first;
-      bool check = false;
-      if (HasRun(reg_id) && !all_done[reg_id].active) {
-        all_done[reg_id].val = IsComplete(reg_id);
-        if (global[reg_id]) {
-          all_done[reg_id].StartReduce(MPI_MIN);
-        } else {
-          check = true;
-        }
-      }
-      if (global[reg_id] && all_done[reg_id].active) {
-        auto status = all_done[reg_id].CheckReduce();
-        if (status == TaskStatus::complete) {
-          check = true;
-        }
-      }
-      if (check) {
-        if (all_done[reg_id].val) {
-          bool clear = false;
-          for (auto &lst : it->second) {
-            clear = lists[lst.first].CompleteIfNeeded(lst.second);
-          }
-          if (clear) {
-            all_done.erase(reg_id);
-            global.erase(reg_id);
-            it = id_for_reg.erase(it);
-          } else {
-            ++it;
-          }
-        } else {
-          for (auto &lst : it->second) {
-            lists[lst.first].ResetIfNeeded(lst.second);
-          }
-          all_done[reg_id].val = 0;
-          ++it;
-        }
-      } else {
-        ++it;
-      }
-    }
-    int complete_cnt = 0;
-    const int num_lists = size();
-    for (auto i = 0; i < num_lists; ++i) {
-      if (lists[i].IsComplete()) complete_cnt++;
-    }
-    return (complete_cnt == num_lists);
-  }
-
-  bool Validate() const {
-    for (auto &list : lists) {
-      if (!list.Validate()) return false;
-    }
-    return true;
-  }
-
- private:
-  void AddDependencies(const std::string &label, const int list_id, const TaskID &tid) {
-    id_for_reg[label][list_id] = tid;
-    lists[list_id].MarkRegional(tid);
-    all_done[label].val = 0;
-  }
-  bool HasRun(const std::string &reg_id) {
-    auto &lvec = id_for_reg[reg_id];
-    int n_to_run = lvec.size();
-    int n_ran = 0;
-    for (auto &pair : lvec) {
-      int list_index = pair.first;
-      TaskID id = pair.second;
-      if (lists[list_index].CheckTaskRan(id)) {
-        n_ran++;
-      }
-    }
-    return n_ran == n_to_run;
-  }
-  bool IsComplete(const std::string &reg_id) {
-    auto &lvec = id_for_reg[reg_id];
-    int n_to_finish = lvec.size();
-    int n_finished = 0;
-    for (auto &pair : lvec) {
-      int list_index = pair.first;
-      TaskID id = pair.second;
-      if (lists[list_index].CheckTaskCompletion(id)) {
-        n_finished++;
-      }
-    }
-    return n_finished == n_to_finish;
-  }
-
-  std::unordered_map<std::string, std::map<int, TaskID>> id_for_reg;
-  std::vector<TaskList> lists;
-  std::unordered_map<std::string, AllReduce<int>> all_done;
-  std::unordered_map<std::string, bool> global;
-};
-
-class TaskCollection {
- public:
-  TaskCollection() = default;
-  TaskRegion &AddRegion(const int num_lists) {
-    regions.push_back(TaskRegion(num_lists));
-    return regions.back();
-  }
-  TaskListStatus Execute() {
-    assert(Validate());
-    for (auto &region : regions) {
-      bool complete = false;
-      while (!complete) {
-        complete = region.Execute();
-      }
-    }
-    return TaskListStatus::complete;
-  }
-
- private:
-  bool Validate() const {
-    for (auto &region : regions) {
-      if (!region.Validate()) return false;
-    }
-    return true;
-  }
-
-  std::vector<TaskRegion> regions;
-};
-
-} // namespace parthenon
-
-#endif // TASKS_TASK_LIST_HPP_
diff --git a/src/tasks/task_types.hpp b/src/tasks/task_types.hpp
deleted file mode 100644
index c3475784e50a..000000000000
--- a/src/tasks/task_types.hpp
+++ /dev/null
@@ -1,102 +0,0 @@
-//========================================================================================
-// (C) (or copyright) 2021. Triad National Security, LLC. All rights reserved.
-//
-// This program was produced under U.S. Government contract 89233218CNA000001 for Los
-// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
-// for the U.S. Department of Energy/National Nuclear Security Administration. All rights
-// in the program are reserved by Triad National Security, LLC, and the U.S. Department
-// of Energy/National Nuclear Security Administration. The Government is granted for
-// itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
-// license in this material to reproduce, prepare derivative works, distribute copies to
-// the public, perform publicly and display publicly, and to permit others to do so.
-//========================================================================================
-
-#ifndef TASKS_TASK_TYPES_HPP_
-#define TASKS_TASK_TYPES_HPP_
-
-#include <chrono> // NOLINT [build/c++11]
-#include <functional>
-#include <string>
-#include <utility>
-#include <vector>
-
-#include "basic_types.hpp"
-#include "globals.hpp"
-
-namespace parthenon {
-
-enum class TaskType { single, iterative, completion_criteria };
-
-class Task {
- public:
-  Task(const TaskID &id, const TaskID &dep, std::function<TaskStatus()> func)
-      : myid_(id), dep_(dep), type_(TaskType::single), key_(-1), func_(std::move(func)),
-        interval_(1) {}
-  Task(const TaskID &id, const TaskID &dep, std::function<TaskStatus()> func,
-       const TaskType &type, const int key)
-      : myid_(id), dep_(dep), type_(type), key_(key), func_(std::move(func)),
-        interval_(1) {
-    assert(key_ >= 0);
-    assert(type_ != TaskType::single);
-  }
-  Task(const TaskID &id, const TaskID &dep, std::function<TaskStatus()> func,
-       const TaskType &type, const int key, const int interval)
-      : myid_(id), dep_(dep), type_(type), key_(key), func_(std::move(func)),
-        interval_(interval) {
-    assert(key_ >= 0);
-    assert(type_ != TaskType::single);
-    assert(interval_ > 0);
-  }
-  void operator()() {
-    if (calls_ == 0) {
-      // on first call, set start time
-      start_time_ = std::chrono::high_resolution_clock::now();
-    }
-
-    calls_++;
-    if (calls_ % interval_ == 0) {
-      // set total runtime of current task, must go into Global namespace because
-      // functions called by the task functor don't have access to the task itself and
-      // they may want to check if the task has been running for too long indicating that
-      // it got stuck in an infinite loop
-      Globals::current_task_runtime_sec =
-          std::chrono::duration_cast<std::chrono::nanoseconds>(
-              std::chrono::high_resolution_clock::now() - start_time_)
-              .count() *
-          1e-9;
-      status_ = func_();
-      Globals::current_task_runtime_sec = 0.0;
-    } else {
-      status_ = TaskStatus::skip;
-    }
-  }
-  void SetID(const TaskID &id) { myid_ = id; }
-  TaskID GetID() const { return myid_; }
-  TaskID GetDependency() const { return dep_; }
-  TaskStatus GetStatus() const { return status_; }
-  void SetStatus(const TaskStatus &status) { status_ = status; }
-  TaskType GetType() const { return type_; }
-  int GetKey() const { return key_; }
-  void SetRegional() { regional_ = true; }
-  bool IsRegional() const { return regional_; }
-
- private:
-  TaskID myid_;
-  const TaskID dep_;
-  const TaskType type_;
-  const int key_;
-  TaskStatus status_ = TaskStatus::incomplete;
-  bool regional_ = false;
-  bool lb_time_ = false;
-  std::function<TaskStatus()> func_;
-  int calls_ = 0;
-  const int interval_;
-
-  // this is used to record the start time of the task so that we can check for how long
-  // the task been running and detect potential hangs, infinite loops, etc.
-  std::chrono::high_resolution_clock::time_point start_time_;
-};
-
-} // namespace parthenon
-
-#endif // TASKS_TASK_TYPES_HPP_
diff --git a/src/tasks/tasks.hpp b/src/tasks/tasks.hpp
new file mode 100644
index 000000000000..c0960787a53e
--- /dev/null
+++ b/src/tasks/tasks.hpp
@@ -0,0 +1,500 @@
+//========================================================================================
+// (C) (or copyright) 2023. Triad National Security, LLC. All rights reserved.
+//
+// This program was produced under U.S. Government contract 89233218CNA000001 for Los
+// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
+// for the U.S. Department of Energy/National Nuclear Security Administration. All rights
+// in the program are reserved by Triad National Security, LLC, and the U.S. Department
+// of Energy/National Nuclear Security Administration. The Government is granted for
+// itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
+// license in this material to reproduce, prepare derivative works, distribute copies to
+// the public, perform publicly and display publicly, and to permit others to do so.
+//========================================================================================
+#ifndef TASKS_TASKS_HPP_
+#define TASKS_TASKS_HPP_
+
+#include <algorithm>
+#include <array>
+#include <cassert>
+#include <functional>
+#include <list>
+#include <memory>
+#include <unordered_map>
+#include <unordered_set>
+#include <utility>
+#include <vector>
+
+#include <basic_types.hpp>
+#include <parthenon_mpi.hpp>
+
+#include "thread_pool.hpp"
+#include "utils/error_checking.hpp"
+
+namespace parthenon {
+
+enum class TaskListStatus { complete }; // doesn't feel like we need this...
+enum class TaskType { normal, completion };
+
+class TaskQualifier {
+ public:
+  using qualifier_t = uint64_t;
+  TaskQualifier() = delete;
+  TaskQualifier(const qualifier_t n) : flags(n) {} // NOLINT(runtime/explicit)
+
+  static inline constexpr qualifier_t normal{0};
+  static inline constexpr qualifier_t local_sync{1 << 0};
+  static inline constexpr qualifier_t global_sync{1 << 1};
+  static inline constexpr qualifier_t completion{1 << 2};
+  static inline constexpr qualifier_t once_per_region{1 << 3};
+
+  bool LocalSync() const { return flags & local_sync; }
+  bool GlobalSync() const { return flags & global_sync; }
+  bool Completion() const { return flags & completion; }
+  bool Once() const { return flags & once_per_region; }
+
+ private:
+  qualifier_t flags;
+};
+
+// forward declare Task for TaskID
+class Task;
+class TaskID {
+ public:
+  TaskID() : task(nullptr) {}
+  // pointers to Task are implicitly convertible to TaskID
+  TaskID(Task *t) : task(t) {} // NOLINT(runtime/explicit)
+
+  TaskID operator|(const TaskID &other) const {
+    // calling this operator means you're building a TaskID to hold a dependency
+    TaskID result;
+    if (task != nullptr)
+      result.dep.push_back(task);
+    else
+      result.dep.insert(result.dep.end(), dep.begin(), dep.end());
+    if (other.task != nullptr)
+      result.dep.push_back(other.task);
+    else
+      result.dep.insert(result.dep.end(), other.dep.begin(), other.dep.end());
+    return result;
+  }
+
+  const std::vector<Task *> &GetIDs() const { return std::cref(dep); }
+
+  bool empty() const { return (!task && dep.size() == 0); }
+  Task *GetTask() { return task; }
+
+ private:
+  Task *task = nullptr;
+  std::vector<Task *> dep;
+};
+
+class Task {
+ public:
+  Task() = default;
+  template <typename TID>
+  Task(TID &&dep, const std::function<TaskStatus()> &func,
+       std::pair<int, int> limits = {1, 1})
+      : f(func), exec_limits(limits) {
+    if (dep.GetIDs().size() == 0 && dep.GetTask()) {
+      dependencies.insert(dep.GetTask());
+    } else {
+      for (auto &d : dep.GetIDs()) {
+        dependencies.insert(d);
+      }
+    }
+    // always add "this" to repeat task if it's incomplete
+    dependent[static_cast<int>(TaskStatus::incomplete)].push_back(this);
+  }
+
+  TaskStatus operator()() {
+    auto status = f();
+    if (task_type == TaskType::completion) {
+      // keep track of how many times it's been called
+      num_calls += (status == TaskStatus::iterate || status == TaskStatus::complete);
+      // enforce minimum number of iterations
+      if (num_calls < exec_limits.first && status == TaskStatus::complete)
+        status = TaskStatus::iterate;
+      // enforce maximum number of iterations
+      if (num_calls == exec_limits.second) status = TaskStatus::complete;
+    }
+    // save the status in the Task object
+    SetStatus(status);
+    return status;
+  }
+  TaskID GetID() { return this; }
+  bool ready() {
+    // check that no dependency is incomplete
+    bool go = true;
+    for (auto &dep : dependencies) {
+      go = go && (dep->GetStatus() != TaskStatus::incomplete);
+    }
+    return go;
+  }
+  void AddDependency(Task *t) { dependencies.insert(t); }
+  std::unordered_set<Task *> &GetDependencies() { return dependencies; }
+  void AddDependent(Task *t, TaskStatus status) {
+    dependent[static_cast<int>(status)].push_back(t);
+  }
+  std::vector<Task *> &GetDependent(TaskStatus status = TaskStatus::complete) {
+    return dependent[static_cast<int>(status)];
+  }
+  void SetType(TaskType type) { task_type = type; }
+  TaskType GetType() { return task_type; }
+  void SetStatus(TaskStatus status) {
+    std::lock_guard<std::mutex> lock(mutex);
+    task_status = status;
+  }
+  TaskStatus GetStatus() {
+    std::lock_guard<std::mutex> lock(mutex);
+    return task_status;
+  }
+  void reset_iteration() { num_calls = 0; }
+
+ private:
+  std::function<TaskStatus()> f;
+  // store a list of tasks that might be available to
+  // run for each possible status this task returns
+  std::array<std::vector<Task *>, 3> dependent;
+  std::unordered_set<Task *> dependencies;
+  std::pair<int, int> exec_limits;
+  TaskType task_type = TaskType::normal;
+  int num_calls = 0;
+  TaskStatus task_status = TaskStatus::incomplete;
+  std::mutex mutex;
+};
+
+class TaskRegion;
+class TaskList {
+  friend class TaskRegion;
+
+ public:
+  TaskList() : TaskList(TaskID(), {1, 1}) {}
+  explicit TaskList(const TaskID &dep, std::pair<int, int> limits)
+      : dependency(dep), exec_limits(limits) {
+    // make a trivial first_task after which others will get launched
+    // simplifies logic for iteration and startup
+    tasks.push_back(std::make_shared<Task>(
+        dependency,
+        [&tasks = tasks]() {
+          for (auto &t : tasks) {
+            t->SetStatus(TaskStatus::incomplete);
+          }
+          return TaskStatus::complete;
+        },
+        exec_limits));
+    first_task = tasks.back().get();
+    // connect list dependencies to this list's first_task
+    for (auto t : first_task->GetDependencies()) {
+      t->AddDependent(first_task, TaskStatus::complete);
+    }
+
+    // make a trivial last_task that tasks dependent on this list's execution
+    // can depend on.  Also simplifies exiting completed iterations
+    tasks.push_back(std::make_shared<Task>(
+        TaskID(),
+        [&completion_tasks = completion_tasks]() {
+          for (auto t : completion_tasks) {
+            t->reset_iteration();
+          }
+          return TaskStatus::complete;
+        },
+        exec_limits));
+    last_task = tasks.back().get();
+  }
+
+  template <class... Args>
+  TaskID AddTask(TaskID dep, Args &&...args) {
+    return AddTask(TaskQualifier::normal, dep, std::forward<Args>(args)...);
+  }
+
+  template <class... Args>
+  TaskID AddTask(const TaskQualifier tq, TaskID dep, Args &&...args) {
+    // user-space tasks always depend on something. if no dependencies are given,
+    // make the task dependent on the list's first_task
+    if (dep.empty()) dep = TaskID(first_task);
+
+    if (!tq.Once() || (tq.Once() && unique_id == 0)) {
+      AddUserTask(dep, std::forward<Args>(args)...);
+    } else {
+      tasks.push_back(std::make_shared<Task>(
+          dep, [=]() { return TaskStatus::complete; }, exec_limits));
+    }
+
+    Task *my_task = tasks.back().get();
+    TaskID id(my_task);
+
+    if (tq.LocalSync() || tq.GlobalSync() || tq.Once()) {
+      regional_tasks.push_back(my_task);
+    }
+
+    if (tq.GlobalSync()) {
+      bool do_mpi = false;
+#ifdef MPI_PARALLEL
+      // make status, request, and comm for this global task
+      global_status.push_back(std::make_shared<int>(0));
+      global_request.push_back(std::make_shared<MPI_Request>(MPI_REQUEST_NULL));
+      // be careful about the custom deleter so it doesn't call
+      // an MPI function after Finalize
+      global_comm.emplace_back(new MPI_Comm, [&](MPI_Comm *d) {
+        int finalized;
+        PARTHENON_MPI_CHECK(MPI_Finalized(&finalized));
+        if (!finalized) PARTHENON_MPI_CHECK(MPI_Comm_free(d));
+      });
+      // we need another communicator to support multiple in flight non-blocking
+      // collectives where we can't guarantee calling order across ranks
+      PARTHENON_MPI_CHECK(MPI_Comm_dup(MPI_COMM_WORLD, global_comm.back().get()));
+      do_mpi = true;
+#endif // MPI_PARALLEL
+      TaskID start;
+      // only call MPI once per region, on the list with unique_id = 0
+      if (unique_id == 0 && do_mpi) {
+#ifdef MPI_PARALLEL
+        // add a task that starts the Iallreduce on the task statuses
+        tasks.push_back(std::make_shared<Task>(
+            id,
+            [my_task, &stat = *global_status.back(), &req = *global_request.back(),
+             &comm = *global_comm.back()]() {
+              // jump through a couple hoops to figure out statuses of all instances of
+              // my_task accross all lists in the enclosing TaskRegion
+              auto dependent = my_task->GetDependent(TaskStatus::complete);
+              assert(dependent.size() == 1);
+              auto mytask = *dependent.begin();
+              stat = 0;
+              for (auto dep : mytask->GetDependencies()) {
+                stat = std::max(stat, static_cast<int>(dep->GetStatus()));
+              }
+              PARTHENON_MPI_CHECK(
+                  MPI_Iallreduce(MPI_IN_PLACE, &stat, 1, MPI_INT, MPI_MAX, comm, &req));
+              return TaskStatus::complete;
+            },
+            exec_limits));
+        start = TaskID(tasks.back().get());
+        // add a task that tests for completion of the Iallreduces of statuses
+        tasks.push_back(std::make_shared<Task>(
+            start,
+            [&stat = *global_status.back(), &req = *global_request.back()]() {
+              int check;
+              PARTHENON_MPI_CHECK(MPI_Test(&req, &check, MPI_STATUS_IGNORE));
+              if (check) {
+                return static_cast<TaskStatus>(stat);
+              }
+              return TaskStatus::incomplete;
+            },
+            exec_limits));
+#endif         // MPI_PARALLEL
+      } else { // unique_id != 0
+        // just add empty tasks
+        tasks.push_back(std::make_shared<Task>(
+            id, [&]() { return TaskStatus::complete; }, exec_limits));
+        start = TaskID(tasks.back().get());
+        tasks.push_back(std::make_shared<Task>(
+            start, [my_task]() { return my_task->GetStatus(); }, exec_limits));
+      }
+      // reset id so it now points at the task that finishes the Iallreduce
+      id = TaskID(tasks.back().get());
+      // make the task that starts the Iallreduce point at the one that finishes it
+      start.GetTask()->AddDependent(id.GetTask(), TaskStatus::complete);
+      // for any status != incomplete, my_task should point at the mpi reduction
+      my_task->AddDependent(start.GetTask(), TaskStatus::complete);
+      my_task->AddDependent(start.GetTask(), TaskStatus::iterate);
+      // make the finish Iallreduce task finish on all lists before moving on
+      regional_tasks.push_back(id.GetTask());
+    }
+
+    // connect completion tasks to last_task
+    if (tq.Completion()) {
+      auto t = id.GetTask();
+      t->SetType(TaskType::completion);
+      t->AddDependent(last_task, TaskStatus::complete);
+      completion_tasks.push_back(t);
+    }
+
+    // make connections so tasks point to this task to run next
+    for (auto d : my_task->GetDependencies()) {
+      if (d->GetType() == TaskType::completion) {
+        d->AddDependent(my_task, TaskStatus::iterate);
+      } else {
+        d->AddDependent(my_task, TaskStatus::complete);
+      }
+    }
+    return id;
+  }
+
+  template <typename TID>
+  std::pair<TaskList &, TaskID> AddSublist(TID &&dep, std::pair<int, int> minmax_iters) {
+    sublists.push_back(std::make_shared<TaskList>(dep, minmax_iters));
+    auto &tl = *sublists.back();
+    tl.SetID(unique_id);
+    return std::make_pair(std::ref(tl), TaskID(tl.last_task));
+  }
+
+ private:
+  TaskID dependency;
+  std::pair<int, int> exec_limits;
+  // put these in shared_ptrs so copying TaskList works as expected
+  std::vector<std::shared_ptr<Task>> tasks;
+  std::vector<std::shared_ptr<TaskList>> sublists;
+#ifdef MPI_PARALLEL
+  std::vector<std::shared_ptr<int>> global_status;
+  std::vector<std::shared_ptr<MPI_Request>> global_request;
+  std::vector<std::shared_ptr<MPI_Comm>> global_comm;
+#endif // MPI_PARALLEL
+  // vectors are fine for these
+  std::vector<Task *> regional_tasks;
+  std::vector<Task *> global_tasks;
+  std::vector<Task *> completion_tasks;
+  // special startup and takedown tasks auto added to lists
+  Task *first_task;
+  Task *last_task;
+  // a unique id to support tasks that should only get executed once per region
+  int unique_id;
+
+  Task *GetStartupTask() { return first_task; }
+  size_t NumRegional() const { return regional_tasks.size(); }
+  Task *Regional(const int i) { return regional_tasks[i]; }
+  void SetID(const int id) { unique_id = id; }
+
+  void ConnectIteration() {
+    if (completion_tasks.size() != 0) {
+      auto last = completion_tasks.back();
+      last->AddDependent(first_task, TaskStatus::iterate);
+    }
+    for (auto &tl : sublists)
+      tl->ConnectIteration();
+  }
+
+  template <class T, class U, class... Args1, class... Args2>
+  void AddUserTask(TaskID &dep, TaskStatus (T::*func)(Args1...), U *obj,
+                   Args2 &&...args) {
+    tasks.push_back(std::make_shared<Task>(
+        dep,
+        [=]() mutable -> TaskStatus {
+          return (obj->*func)(std::forward<Args2>(args)...);
+        },
+        exec_limits));
+  }
+
+  template <class F, class... Args>
+  void AddUserTask(TaskID &dep, F &&func, Args &&...args) {
+    tasks.push_back(std::make_shared<Task>(
+        dep,
+        [=, func = std::forward<F>(func)]() mutable -> TaskStatus {
+          return func(std::forward<Args>(args)...);
+        },
+        exec_limits));
+  }
+};
+
+class TaskRegion {
+ public:
+  TaskRegion() = delete;
+  explicit TaskRegion(const int num_lists) : task_lists(num_lists) {
+    for (int i = 0; i < num_lists; i++)
+      task_lists[i].SetID(i);
+  }
+
+  TaskListStatus Execute(ThreadPool &pool) {
+    // for now, require a pool with one thread
+    PARTHENON_REQUIRE_THROWS(pool.size() == 1,
+                             "ThreadPool size != 1 is not currently supported.")
+
+    // first, if needed, finish building the graph
+    if (!graph_built) BuildGraph();
+
+    // declare this so it can call itself
+    std::function<TaskStatus(Task *)> ProcessTask;
+    ProcessTask = [&pool, &ProcessTask](Task *task) -> TaskStatus {
+      auto status = task->operator()();
+      auto next_up = task->GetDependent(status);
+      for (auto t : next_up) {
+        if (t->ready()) {
+          pool.enqueue([t, &ProcessTask]() { return ProcessTask(t); });
+        }
+      }
+      return status;
+    };
+
+    // now enqueue the "first_task" for all task lists
+    for (auto &tl : task_lists) {
+      auto t = tl.GetStartupTask();
+      pool.enqueue([t, &ProcessTask]() { return ProcessTask(t); });
+    }
+
+    // then wait until everything is done
+    pool.wait();
+
+    return TaskListStatus::complete;
+  }
+
+  TaskList &operator[](const int i) { return task_lists[i]; }
+
+  size_t size() const { return task_lists.size(); }
+
+ private:
+  std::vector<TaskList> task_lists;
+  bool graph_built = false;
+
+  void BuildGraph() {
+    // first handle regional dependencies
+    const auto num_lists = task_lists.size();
+    const auto num_regional = task_lists.front().NumRegional();
+    std::vector<Task *> tasks(num_lists);
+    for (int i = 0; i < num_regional; i++) {
+      for (int j = 0; j < num_lists; j++) {
+        tasks[j] = task_lists[j].Regional(i);
+      }
+      std::vector<std::vector<Task *>> reg_dep;
+      for (int j = 0; j < num_lists; j++) {
+        reg_dep.push_back(std::vector<Task *>());
+        for (auto t : tasks[j]->GetDependent(TaskStatus::complete)) {
+          reg_dep[j].push_back(t);
+        }
+      }
+      for (int j = 0; j < num_lists; j++) {
+        for (auto t : reg_dep[j]) {
+          for (int k = 0; k < num_lists; k++) {
+            if (j == k) continue;
+            t->AddDependency(tasks[k]);
+            tasks[k]->AddDependent(t, TaskStatus::complete);
+          }
+        }
+      }
+    }
+
+    // now hook up iterations
+    for (auto &tl : task_lists) {
+      tl.ConnectIteration();
+    }
+
+    graph_built = true;
+  }
+};
+
+class TaskCollection {
+ public:
+  TaskCollection() = default;
+
+  TaskRegion &AddRegion(const int num_lists) {
+    regions.emplace_back(num_lists);
+    return regions.back();
+  }
+  TaskListStatus Execute() {
+    ThreadPool pool(1);
+    return Execute(pool);
+  }
+  TaskListStatus Execute(ThreadPool &pool) {
+    TaskListStatus status;
+    for (auto &region : regions) {
+      status = region.Execute(pool);
+      if (status != TaskListStatus::complete) return status;
+    }
+    return TaskListStatus::complete;
+  }
+
+ private:
+  std::list<TaskRegion> regions;
+};
+
+} // namespace parthenon
+
+#endif // TASKS_TASKS_HPP_
diff --git a/src/tasks/thread_pool.hpp b/src/tasks/thread_pool.hpp
new file mode 100644
index 000000000000..b8f526750230
--- /dev/null
+++ b/src/tasks/thread_pool.hpp
@@ -0,0 +1,139 @@
+//========================================================================================
+// (C) (or copyright) 2023. Triad National Security, LLC. All rights reserved.
+//
+// This program was produced under U.S. Government contract 89233218CNA000001 for Los
+// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
+// for the U.S. Department of Energy/National Nuclear Security Administration. All rights
+// in the program are reserved by Triad National Security, LLC, and the U.S. Department
+// of Energy/National Nuclear Security Administration. The Government is granted for
+// itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
+// license in this material to reproduce, prepare derivative works, distribute copies to
+// the public, perform publicly and display publicly, and to permit others to do so.
+//========================================================================================
+
+#ifndef TASKS_THREAD_POOL_HPP_
+#define TASKS_THREAD_POOL_HPP_
+
+#include <condition_variable>
+#include <functional>
+#include <future>
+#include <memory>
+#include <mutex>
+#include <queue>
+#include <thread>
+#include <utility>
+#include <vector>
+
+namespace parthenon {
+
+template <typename T>
+class ThreadQueue {
+ public:
+  explicit ThreadQueue(const int num_workers) : nworkers(num_workers), nwaiting(0) {}
+  void push(T q) {
+    std::lock_guard<std::mutex> lock(mutex);
+    queue.push(q);
+    cv.notify_one();
+  }
+  bool pop(T &q) {
+    std::unique_lock<std::mutex> lock(mutex);
+    if (queue.empty()) {
+      nwaiting++;
+      if (waiting && nwaiting == nworkers) {
+        complete = true;
+        complete_cv.notify_all();
+      }
+      cv.wait(lock, [this]() { return exit || !queue.empty(); });
+      nwaiting--;
+      if (exit) return true;
+    }
+    q = queue.front();
+    queue.pop();
+    return false;
+  }
+  void signal_kill() {
+    std::lock_guard<std::mutex> lock(mutex);
+    std::queue<T>().swap(queue);
+    complete = true;
+    exit = true;
+    cv.notify_all();
+  }
+  void signal_exit_when_finished() {
+    std::lock_guard<std::mutex> lock(mutex);
+    exit = true;
+    complete = true;
+    cv.notify_all();
+  }
+  void wait_for_complete() {
+    std::unique_lock<std::mutex> lock(mutex);
+    waiting = true;
+    if (queue.empty() && nwaiting == nworkers) {
+      complete = false;
+      waiting = false;
+      return;
+    }
+    complete_cv.wait(lock, [this]() { return complete; });
+    complete = false;
+    waiting = false;
+  }
+
+ private:
+  const int nworkers;
+  int nwaiting;
+  std::queue<T> queue;
+  std::mutex mutex;
+  std::condition_variable cv;
+  std::condition_variable complete_cv;
+  bool complete = false;
+  bool exit = false;
+  bool waiting = false;
+};
+
+class ThreadPool {
+ public:
+  explicit ThreadPool(const int numthreads = std::thread::hardware_concurrency())
+      : nthreads(numthreads), queue(nthreads) {
+    for (int i = 0; i < nthreads; i++) {
+      auto worker = [&]() {
+        while (true) {
+          std::function<void()> f;
+          auto stop = queue.pop(f);
+          if (stop) break;
+          if (f) f();
+        }
+      };
+      threads.emplace_back(worker);
+    }
+  }
+  ~ThreadPool() {
+    queue.signal_exit_when_finished();
+    for (auto &t : threads) {
+      t.join();
+    }
+  }
+
+  void wait() { queue.wait_for_complete(); }
+
+  void kill() { queue.signal_kill(); }
+
+  template <typename F, class... Args>
+  std::future<typename std::result_of<F(Args...)>::type> enqueue(F &&f, Args &&...args) {
+    using return_t = typename std::result_of<F(Args...)>::type;
+    auto task = std::make_shared<std::packaged_task<return_t()>>(
+        [=, func = std::forward<F>(f)] { return func(std::forward<Args>(args)...); });
+    std::future<return_t> result = task->get_future();
+    queue.push([task]() { (*task)(); });
+    return result;
+  }
+
+  int size() const { return nthreads; }
+
+ private:
+  const int nthreads;
+  std::vector<std::thread> threads;
+  ThreadQueue<std::function<void()>> queue;
+};
+
+} // namespace parthenon
+
+#endif // TASKS_THREAD_POOL_HPP_
diff --git a/tst/style/cpplint.py b/tst/style/cpplint.py
index 4df4a7d26033..c2c402f46295 100755
--- a/tst/style/cpplint.py
+++ b/tst/style/cpplint.py
@@ -7026,11 +7026,11 @@ def FlagCxx11Features(filename, clean_lines, linenum, error):
     # Flag unapproved C++11 headers.
     if include and include.group(1) in (
         "cfenv",
-        "condition_variable",
+        # "condition_variable",
         "fenv.h",
-        "future",
-        "mutex",
-        "thread",
+        # "future",
+        # "mutex",
+        # "thread",
         # "chrono",
         "ratio",
         # "regex",
diff --git a/tst/unit/test_taskid.cpp b/tst/unit/test_taskid.cpp
index 0844226ad48d..14dcc09b500a 100644
--- a/tst/unit/test_taskid.cpp
+++ b/tst/unit/test_taskid.cpp
@@ -19,34 +19,20 @@
 
 #include <catch2/catch.hpp>
 
-#include "tasks/task_id.hpp"
+#include "tasks/tasks.hpp"
 
+using parthenon::Task;
 using parthenon::TaskID;
 
-TEST_CASE("Just check everything", "[CheckDependencies][SetFinished][equal][or]") {
+TEST_CASE("Just check everything", "[GetIDs][empty]") {
   GIVEN("Some TaskIDs") {
-    TaskID a(1);
-    TaskID b(2);
-    TaskID c(BITBLOCK + 1); // make sure we get a task with more than one block
-    TaskID complete;
-
-    TaskID ac = (a | c);
-    bool should_be_false = ac.CheckDependencies(b);
-    bool should_be_truea = ac.CheckDependencies(a);
-    bool should_be_truec = ac.CheckDependencies(c);
-    TaskID abc = (a | b | c);
-    complete.SetFinished(abc);
-    bool equal_true = (complete == abc);
-    bool equal_false = (complete == ac);
-
-    REQUIRE(should_be_false == false);
-    REQUIRE(should_be_truea == true);
-    REQUIRE(should_be_truec == true);
-    REQUIRE(equal_true == true);
-    REQUIRE(equal_false == false);
-
-    WHEN("a negative number is passed") {
-      REQUIRE_THROWS_AS(a.Set(-1), std::invalid_argument);
-    }
+    Task ta, tb;
+    TaskID a(&ta);
+    TaskID b(&tb);
+    TaskID c = a | b;
+    TaskID none;
+
+    REQUIRE(none.empty() == true);
+    REQUIRE(c.GetIDs().size() == 2);
   }
 }
diff --git a/tst/unit/test_tasklist.cpp b/tst/unit/test_tasklist.cpp
index f06ce49c3e99..1790a4eb0ad0 100644
--- a/tst/unit/test_tasklist.cpp
+++ b/tst/unit/test_tasklist.cpp
@@ -19,7 +19,7 @@
 
 // Internal Includes
 #include "basic_types.hpp"
-#include "tasks/task_list.hpp"
+#include "tasks/tasks.hpp"
 
 using parthenon::TaskID;
 using parthenon::TaskList;