Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking Change] Tasking rewrite #987

Merged
merged 34 commits into from
Jan 24, 2024
Merged

[Breaking Change] Tasking rewrite #987

merged 34 commits into from
Jan 24, 2024

Conversation

jdolence
Copy link
Collaborator

@jdolence jdolence commented Dec 14, 2023

PR Summary

[Breaking Change]
The API for representing iterations through our tasks has changed. AddIteration is replaced with AddSublist and the arguments and return values are a little different. Additionally, AddCompletionTask is replaced with AddTask with TaskQualifiers to specify that it is a completion task (and whatever other qualifiers are appropriate). AddRegionalDependencies is similarly replaced by providing the appropriate TaskQualifier to AddTask.

It's high time we cleaned up the tasking code. I take a crack at that in this PR. For the most part, this should be compatible with downstream codes, but it will be breaking for anyone using IterativeTasks. Some basics:

  • Replaces all the while loops over lists with a more proper graph traversal
  • Gets rid of all the bitmask stuff, totally reimagining what TaskID is
  • Completely removes IterativeTasks in favor of TaskLists owning TaskLists

Fun new features:

  • Graph traversal/task execution is now threaded. You pass in a thread pool object to the TaskCollection Execute function and tasks are launched in parallel across the threads in the pool. If you pass no pool to Execute, it just constructs a pool with one thread, so should execute more or less how it did before.
  • Executing a TaskCollection/TaskRegion/TaskList is no longer self-destructive. This allows for a model where a task collection is built once and executed many times.
  • The functionality in what was IterativeTasks is now more flexibile and the interface is cleaner.
    • You specify a min and max number of iterations when adding the "sublist" via the TaskList::AddSublist function. {1,1} means it's just a sublist that won't iterate.
    • You can have arbitrarily nested iterations
    • You can specify more than one completion task, to allow for early aborts from a list
  • New TaskQualifier specifier. You can combine the qualifiers below with the | operator, just like for dependencies. The interesting ones are
    • local_sync: this replaces the old regional dependencies stuff. It says that dependent tasks can't execute until all the lists in your rank's region have executed this task
    • global_sync: this says that all tasks across all ranks and lists must have executed before dependent tasks can execute
    • completion: allows for tasks that conditionally continue execution of the list/cycle back to the beginning (TaskStatus::iterate) or abort execution of the rest of the list/stop iterating (TaskStatus::complete). The min/max number of iterations for a list are always honored regardless of the status returned by completion tasks.
    • once_per_region: for a region with N partitions, a tasked marked with this qualifier will only execute a single time instead of N times as usual.

I don't add any specific tests because all of the functionality is well tested in our current regression tests. The only exception is multithreaded task execution. For now, this feature shouldn't be used anyway because Parthenon has not been made thread-safe everywhere it needs to be. There is a note to this effect in the docs. Also, just calling Execute() on a TaskCollection (as everyone is presumably doing downstream) leads to single-threaded execution.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@jdolence jdolence changed the title Tasking rewrite WIP: Tasking rewrite Dec 14, 2023
Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much cleaner than the old tasking machinery. Some things I like:

  1. The graph walking is clear and concise, and dependencies are much easier to reasona bout.
  2. The fact that sub lists are recursive means iterations and sublists and regional dependencies are all easier to reason about.
  3. Threads!!!!

My one concern would be that walking the graph might be expensive in practice, but I suspect that it's not, compared to everything else we're doing. Especially since the task list can now be built once and not rebuilt (which is something that should probably be default behavior.

src/CMakeLists.txt Outdated Show resolved Hide resolved
enum class TaskStatus { fail, complete, incomplete, iterate, skip, waiting };
enum class TaskStatus { complete, incomplete, iterate };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/defs.hpp Show resolved Hide resolved
src/tasks/thread_pool.hpp Outdated Show resolved Hide resolved
std::lock_guard<std::mutex> lock(mutex);
std::queue<T>().swap(queue);
complete = true;
exit = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of shocked exit is not a protected name.

src/tasks/thread_pool.hpp Show resolved Hide resolved
src/tasks/tasks.hpp Outdated Show resolved Hide resolved
src/tasks/tasks.hpp Outdated Show resolved Hide resolved
TaskID GetID() { return this; }
bool ready() {
// check that no dependency is incomplete
bool go = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool go = true;
// set...
bool go = true;

ready... set... go

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

src/tasks/tasks.hpp Outdated Show resolved Hide resolved
@jdolence jdolence changed the title WIP: Tasking rewrite [Breaking Change] Tasking rewrite Dec 18, 2023
@jdolence
Copy link
Collaborator Author

@pgrete, I'd like to get this merged ASAP. A couple different codes want to use the new mechanism to express iterations. Think you can review soon? Should we just merge? How should we proceed?

Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the interface cleanup (and infrastructure below) and sorry for the delay (I lost track of this PR after I reviewed the other after the sync).

I only left some clarifying questions.
Otherwise, a more general question would whether the new execute machinery benefits from the threaded implementation as is, or, in other words if we don't support that feature anyway, could this functionality be stashed until it'll actually be used so that there's less "to-be-used-in-the-future(-but-maintained-already-now)" code?

CMakeLists.txt Show resolved Hide resolved
doc/sphinx/src/tasks.rst Outdated Show resolved Hide resolved
using namespace utils;
auto &md = pmesh->mesh_data.GetOrAdd("base", i);
TaskID none;
auto &md = pmesh->mesh_data.GetOrAdd("base", partition);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging #959 again.

I'm actually starting to wonder if we should remove support (i.e., make it technically impossible) to vary the partition size as quite a few place are now making some implicit assumption through using GetOrAdd.

PS: This is independent of this PR.

src/tasks/tasks.hpp Outdated Show resolved Hide resolved
if (num_calls < exec_limits.first && status == TaskStatus::complete)
status = TaskStatus::iterate;
// enforce maximum number of iterations
if (num_calls == exec_limits.second) status = TaskStatus::complete;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have another status like ::ran_out_of_iterations_before_X?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably. I suppose that would allow adding some kind of option that would warn or abort if an iteration failed to properly complete. That seems like a good idea. I'll either add an issue so I can deal with this later or add the feature if it looks simple enough.

src/tasks/tasks.hpp Show resolved Hide resolved
Comment on lines +402 to +403
PARTHENON_REQUIRE_THROWS(pool.size() == 1,
"ThreadPool size != 1 is not currently supported.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for being explicit

@pgrete pgrete added breaks-downstream refactor An improvement to existing code. labels Jan 18, 2024
@bprather
Copy link
Collaborator

@pgrete I think there's a chance the current threading may already benefit my use case, because I spend a lot of CPU time building variable lists & packs, so there's the potential that any reduced synchronicity speeds things up for me in at least some cases.
I can try to scare up some performance numbers if that would help -- otherwise, I was going to put off moving to this branch/latest Parthenon until outputs/restarts of face-centered fields land, to avoid more fun franken-branch merge conflict bugs.

@jdolence
Copy link
Collaborator Author

Thanks for the interface cleanup (and infrastructure below) and sorry for the delay (I lost track of this PR after I reviewed the other after the sync).

I only left some clarifying questions. Otherwise, a more general question would whether the new execute machinery benefits from the threaded implementation as is, or, in other words if we don't support that feature anyway, could this functionality be stashed until it'll actually be used so that there's less "to-be-used-in-the-future(-but-maintained-already-now)" code?

The threads currently provide no value, but I've already started working on making other parts of parthenon thread-safe to enable nthreads != 1, so I'm hoping this code won't sit around for long in it's current state. Given that, I'd rather not do the extra work of removing here just to add back very soon. It's a fair point/question though -- I'll try to be quick about getting threads working more fully.

@jdolence jdolence enabled auto-merge January 19, 2024 20:36
@jdolence jdolence disabled auto-merge January 19, 2024 20:37
@jdolence jdolence enabled auto-merge January 19, 2024 20:50
@pgrete
Copy link
Collaborator

pgrete commented Jan 22, 2024

@pgrete I think there's a chance the current threading may already benefit my use case, because I spend a lot of CPU time building variable lists & packs, so there's the potential that any reduced synchronicity speeds things up for me in at least some cases.

That's actually an interesting/important use-case/questions: Is our packing machinery thread safe?

I can try to scare up some performance numbers if that would help -- otherwise, I was going to put off moving to this branch/latest Parthenon until outputs/restarts of face-centered fields land, to avoid more fun franken-branch merge conflict bugs.

No worries. I think as long we have the handbrakes on (enforcing nthreads=1) for now by default, don't spent your coding cycles on franken-branches.

@pgrete
Copy link
Collaborator

pgrete commented Jan 22, 2024

Not sure why the status for CI are not reported. I re-triggering.

@pgrete pgrete disabled auto-merge January 22, 2024 15:43
@pgrete pgrete enabled auto-merge (squash) January 22, 2024 15:44
@pgrete
Copy link
Collaborator

pgrete commented Jan 22, 2024

CI failed and looks like a legit fail.

@jdolence
Copy link
Collaborator Author

CI failed and looks like a legit fail.

Looks that way, I think. Haven't been able to reproduce though...

@jdolence jdolence disabled auto-merge January 24, 2024 22:12
@jdolence jdolence enabled auto-merge (squash) January 24, 2024 22:12
@jdolence jdolence merged commit 7855248 into develop Jan 24, 2024
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaks-downstream refactor An improvement to existing code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants