[Breaking Change] Tasking rewrite #987

jdolence · 2023-12-14T20:20:51Z

PR Summary

[Breaking Change]
The API for representing iterations through our tasks has changed. AddIteration is replaced with AddSublist and the arguments and return values are a little different. Additionally, AddCompletionTask is replaced with AddTask with TaskQualifiers to specify that it is a completion task (and whatever other qualifiers are appropriate). AddRegionalDependencies is similarly replaced by providing the appropriate TaskQualifier to AddTask.

It's high time we cleaned up the tasking code. I take a crack at that in this PR. For the most part, this should be compatible with downstream codes, but it will be breaking for anyone using IterativeTasks. Some basics:

Replaces all the while loops over lists with a more proper graph traversal
Gets rid of all the bitmask stuff, totally reimagining what TaskID is
Completely removes IterativeTasks in favor of TaskLists owning TaskLists

Fun new features:

Graph traversal/task execution is now threaded. You pass in a thread pool object to the TaskCollection Execute function and tasks are launched in parallel across the threads in the pool. If you pass no pool to Execute, it just constructs a pool with one thread, so should execute more or less how it did before.
Executing a TaskCollection/TaskRegion/TaskList is no longer self-destructive. This allows for a model where a task collection is built once and executed many times.
The functionality in what was IterativeTasks is now more flexibile and the interface is cleaner.
- You specify a min and max number of iterations when adding the "sublist" via the TaskList::AddSublist function. {1,1} means it's just a sublist that won't iterate.
- You can have arbitrarily nested iterations
- You can specify more than one completion task, to allow for early aborts from a list
New TaskQualifier specifier. You can combine the qualifiers below with the | operator, just like for dependencies. The interesting ones are
- local_sync: this replaces the old regional dependencies stuff. It says that dependent tasks can't execute until all the lists in your rank's region have executed this task
- global_sync: this says that all tasks across all ranks and lists must have executed before dependent tasks can execute
- completion: allows for tasks that conditionally continue execution of the list/cycle back to the beginning (TaskStatus::iterate) or abort execution of the rest of the list/stop iterating (TaskStatus::complete). The min/max number of iterations for a list are always honored regardless of the status returned by completion tasks.
- once_per_region: for a region with N partitions, a tasked marked with this qualifier will only execute a single time instead of N times as usual.

I don't add any specific tests because all of the functionality is well tested in our current regression tests. The only exception is multithreaded task execution. For now, this feature shouldn't be used anyway because Parthenon has not been made thread-safe everywhere it needs to be. There is a note to this effect in the docs. Also, just calling Execute() on a TaskCollection (as everyone is presumably doing downstream) leads to single-threaded execution.

PR Checklist

Yurlungur

This is much cleaner than the old tasking machinery. Some things I like:

The graph walking is clear and concise, and dependencies are much easier to reasona bout.
The fact that sub lists are recursive means iterations and sublists and regional dependencies are all easier to reason about.
Threads!!!!

My one concern would be that walking the graph might be expensive in practice, but I suspect that it's not, compared to everything else we're doing. Especially since the task list can now be built once and not rebuilt (which is something that should probably be default behavior.

src/CMakeLists.txt

Yurlungur · 2023-12-14T20:34:40Z

src/basic_types.hpp

-enum class TaskStatus { fail, complete, incomplete, iterate, skip, waiting };
+enum class TaskStatus { complete, incomplete, iterate };


src/defs.hpp

src/tasks/thread_pool.hpp

Yurlungur · 2023-12-14T20:43:01Z

src/tasks/thread_pool.hpp

+    std::lock_guard<std::mutex> lock(mutex);
+    std::queue<T>().swap(queue);
+    complete = true;
+    exit = true;


I'm kind of shocked exit is not a protected name.

src/tasks/thread_pool.hpp

src/tasks/tasks.hpp

Yurlungur · 2023-12-14T21:01:30Z

src/tasks/tasks.hpp

+  TaskID GetID() { return this; }
+  bool ready() {
+    // check that no dependency is incomplete
+    bool go = true;


Suggested change

bool go = true;

// set...

bool go = true;

ready... set... go

src/tasks/tasks.hpp

Co-authored-by: Jonah Miller <[email protected]>

jdolence · 2024-01-18T20:37:06Z

@pgrete, I'd like to get this merged ASAP. A couple different codes want to use the new mechanism to express iterations. Think you can review soon? Should we just merge? How should we proceed?

pgrete

Thanks for the interface cleanup (and infrastructure below) and sorry for the delay (I lost track of this PR after I reviewed the other after the sync).

I only left some clarifying questions.
Otherwise, a more general question would whether the new execute machinery benefits from the threaded implementation as is, or, in other words if we don't support that feature anyway, could this functionality be stashed until it'll actually be used so that there's less "to-be-used-in-the-future(-but-maintained-already-now)" code?

CMakeLists.txt

doc/sphinx/src/tasks.rst

pgrete · 2024-01-18T21:28:34Z

src/solvers/bicgstab_solver.hpp

    using namespace utils;
-    auto &md = pmesh->mesh_data.GetOrAdd("base", i);
+    TaskID none;
+    auto &md = pmesh->mesh_data.GetOrAdd("base", partition);


Flagging #959 again.

I'm actually starting to wonder if we should remove support (i.e., make it technically impossible) to vary the partition size as quite a few place are now making some implicit assumption through using GetOrAdd.

PS: This is independent of this PR.

src/tasks/tasks.hpp

pgrete · 2024-01-18T21:48:27Z

src/tasks/tasks.hpp

+      if (num_calls < exec_limits.first && status == TaskStatus::complete)
+        status = TaskStatus::iterate;
+      // enforce maximum number of iterations
+      if (num_calls == exec_limits.second) status = TaskStatus::complete;


Would it make sense to have another status like ::ran_out_of_iterations_before_X?

probably. I suppose that would allow adding some kind of option that would warn or abort if an iteration failed to properly complete. That seems like a good idea. I'll either add an issue so I can deal with this later or add the feature if it looks simple enough.

src/tasks/tasks.hpp

pgrete · 2024-01-18T22:00:59Z

src/tasks/tasks.hpp

+    PARTHENON_REQUIRE_THROWS(pool.size() == 1,
+                             "ThreadPool size != 1 is not currently supported.")


👍 for being explicit

bprather · 2024-01-18T23:11:14Z

@pgrete I think there's a chance the current threading may already benefit my use case, because I spend a lot of CPU time building variable lists & packs, so there's the potential that any reduced synchronicity speeds things up for me in at least some cases.
I can try to scare up some performance numbers if that would help -- otherwise, I was going to put off moving to this branch/latest Parthenon until outputs/restarts of face-centered fields land, to avoid more fun franken-branch merge conflict bugs.

jdolence · 2024-01-19T16:38:15Z

Thanks for the interface cleanup (and infrastructure below) and sorry for the delay (I lost track of this PR after I reviewed the other after the sync).

I only left some clarifying questions. Otherwise, a more general question would whether the new execute machinery benefits from the threaded implementation as is, or, in other words if we don't support that feature anyway, could this functionality be stashed until it'll actually be used so that there's less "to-be-used-in-the-future(-but-maintained-already-now)" code?

The threads currently provide no value, but I've already started working on making other parts of parthenon thread-safe to enable nthreads != 1, so I'm hoping this code won't sit around for long in it's current state. Given that, I'd rather not do the extra work of removing here just to add back very soon. It's a fair point/question though -- I'll try to be quick about getting threads working more fully.

pgrete · 2024-01-22T15:39:27Z

@pgrete I think there's a chance the current threading may already benefit my use case, because I spend a lot of CPU time building variable lists & packs, so there's the potential that any reduced synchronicity speeds things up for me in at least some cases.

That's actually an interesting/important use-case/questions: Is our packing machinery thread safe?

I can try to scare up some performance numbers if that would help -- otherwise, I was going to put off moving to this branch/latest Parthenon until outputs/restarts of face-centered fields land, to avoid more fun franken-branch merge conflict bugs.

No worries. I think as long we have the handbrakes on (enforcing nthreads=1) for now by default, don't spent your coding cycles on franken-branches.

pgrete · 2024-01-22T15:43:47Z

Not sure why the status for CI are not reported. I re-triggering.

pgrete · 2024-01-22T15:53:41Z

CI failed and looks like a legit fail.

jdolence · 2024-01-22T17:21:41Z

CI failed and looks like a legit fail.

Looks that way, I think. Haven't been able to reproduce though...

…the max iterations

jdolence added 4 commits December 11, 2023 08:49

trying to use new tasks

bee5684

Merge branch 'lroberts36/bugfix-sparse-cache' into jdolence/new_tasking

e881ad9

remove debugging

90f3e59

formatting

92564e1

jdolence changed the title ~~Tasking rewrite~~ WIP: Tasking rewrite Dec 14, 2023

jdolence added 9 commits December 14, 2023 13:23

remove raw mpi.hpp include

6fde57d

style

2320c0e

more style

95818ba

and more style

d602a35

ok thats enough

10a67f1

actually remove the old task stuff

23803d0

formatting

a4db040

maybe last style commit...

8b7d42a

oops, includes inside parthenon namespace

52f0d5a

Yurlungur reviewed Dec 14, 2023

View reviewed changes

jdolence and others added 10 commits December 14, 2023 14:10

update TaskID unit test

e6eb2e3

missing header

ce7a6bb

port the poisson examples

1ddc2e0

try to fix serial builds

0bd54cf

clean up branching in | operator of TaskID

6082812

Co-authored-by: Jonah Miller <[email protected]>

rename Queue ThreadQueue

07ae71a

formatting

c1dbcb3

try to fix builds with threads

fbbe02a

update tasking docs

d39a31a

formatting and update changelog

b074ee6

jdolence changed the title ~~WIP: Tasking rewrite~~ [Breaking Change] Tasking rewrite Dec 18, 2023

jdolence requested review from bprather, lroberts36, pgrete and pdmullen December 18, 2023 19:25

jdolence added 4 commits January 9, 2024 11:19

address review comments

829e047

merge develop

fc16f0f

style

b400c11

add a comment about the dependent variable in Task

9957538

pgrete approved these changes Jan 18, 2024

View reviewed changes

pgrete added breaks-downstream refactor An improvement to existing code. labels Jan 18, 2024

jdolence mentioned this pull request Jan 19, 2024

Warn/Fail options when max_iters is reached for iterative tasks #997

Open

jdolence and others added 2 commits January 19, 2024 13:34

address review comments

6a33dd6

Merge branch 'develop' into jdolence/new_tasking

bf290fc

jdolence enabled auto-merge January 19, 2024 20:36

jdolence disabled auto-merge January 19, 2024 20:37

add TaskQualifier to driver prelude

6029f7d

jdolence enabled auto-merge January 19, 2024 20:50

move using statement

ae047de

pgrete disabled auto-merge January 22, 2024 15:43

pgrete enabled auto-merge (squash) January 22, 2024 15:44

jdolence added 3 commits January 23, 2024 13:16

fix bug in ThreadQueue

cf59020

set final_residual in gmg and bicgstab even if they exit by reaching …

dc16a32

…the max iterations

fix serial case for tasks marked completion and global_sync

18628be

jdolence disabled auto-merge January 24, 2024 22:12

jdolence enabled auto-merge (squash) January 24, 2024 22:12

jdolence merged commit 7855248 into develop Jan 24, 2024
53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Breaking Change] Tasking rewrite #987

[Breaking Change] Tasking rewrite #987

jdolence commented Dec 14, 2023 •

edited

Loading

Yurlungur left a comment

Yurlungur Dec 14, 2023

Yurlungur Dec 14, 2023

Yurlungur Dec 14, 2023

pdmullen Dec 18, 2023

jdolence commented Jan 18, 2024

pgrete left a comment

pgrete Jan 18, 2024

pgrete Jan 18, 2024

jdolence Jan 19, 2024

pgrete Jan 18, 2024

bprather commented Jan 18, 2024

jdolence commented Jan 19, 2024

pgrete commented Jan 22, 2024

pgrete commented Jan 22, 2024

pgrete commented Jan 22, 2024

jdolence commented Jan 22, 2024

		enum class TaskStatus { fail, complete, incomplete, iterate, skip, waiting };
		enum class TaskStatus { complete, incomplete, iterate };

		PARTHENON_REQUIRE_THROWS(pool.size() == 1,
		"ThreadPool size != 1 is not currently supported.")

[Breaking Change] Tasking rewrite #987

[Breaking Change] Tasking rewrite #987

Conversation

jdolence commented Dec 14, 2023 • edited Loading

PR Summary

PR Checklist

Yurlungur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdolence commented Jan 18, 2024

pgrete left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bprather commented Jan 18, 2024

jdolence commented Jan 19, 2024

pgrete commented Jan 22, 2024

pgrete commented Jan 22, 2024

pgrete commented Jan 22, 2024

jdolence commented Jan 22, 2024

jdolence commented Dec 14, 2023 •

edited

Loading