-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor how we interface with performance instrumentation #969
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this
… post process output from kokkos simple kernel timer.
@pgrete want to make sure this doesn't interfere with something you're doing downstream. should be really quick to look over, despite the 40 modified files... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quickly double checking: Given that the "standard" push and pop regions are still called in the KokkosTimer
all existing instrumentation (and their interfaces to other tools) should work as is (minus some changed names), correct?
I like this approach and would like to briefly try in AthenaPK before pushing approve.
I should be able to test prior to the sync on Thursday.
The key difference to the previous approach (when not using plain Kokkos regions, which is still supported) is that PARTHENON_INSTRUMENT_REGION
would need to be scoped as there's no explicit "end" for the macros, right?
Finally, I noticed that you replaced all(?) existing names with the auto based scheme.
How did this translate to your profiling experience in practice?
I imagine a couple of places where the explicit names introduce some value add, e.g., for the Step()
function in the driver to differentiate between the MultiStage_Step
and the MultiStageBlockTask_Step
, which in the current version would differ only in their line number if I'm not mistaken.
Yes, that's correct.
Cool, sounds good.
Yes, that's right. We could instead make the calls to
Yes, I replaced all the names (or at least intended to). The nice thing about this is that it makes it straightforward to post-process the kokkos profiling data (I've really only played with the simple kernel timer output) to give really useful views of performance that are otherwise annoying to get at. For example, how does performance compare across kernels in I think your example is correct. I think they would only differ by the line number. Two comments, though. Presumably you know which driver you're using so I'm not sure there's much ambiguity here in practice. Also, the line number does guarantee an unambiguous pointer to the right piece of code. If a developer were lazy, not thinking carefully, or just otherwise unaware of other things in the code, I think it's possible to name two regions with the same string, which might lead to some confusion. There are a couple places where I'd agree the line numbers are perhaps slightly more annoying. The two that come to mind in our current code are in the |
I now tested this in AthenaPK and it works like a charm. |
@pgrete I think I've at least responded to all your comments. Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Summary
This PR does primarily two things:
Kokkos::Profiling::pushRegion
andKokkos::Profiling::popRegion
. Right now, thePARTHENON_INSTRUMENT
macro instantiates a (new)KokkosTimer
object that pushes a region on construction, with an auto-generated name of the formname_of_file.cpp::line_number::function_name
, and then pops the region when the object is destroyed when it goes out of scope. The second macro,PARTHENON_INSTRUMENT_REGION(name)
, does the same thing but takes a name argument so you can call the region whatever you like (this version is also required for the raw for loop overloads of ourpar_for
functions).PARTHENON_AUTO_LABEL
macro that results in a string as above --name_of_file.cpp::line_number::function_name
. This macro is now used to label all kernels in parthenon.Together, we are guaranteed that results from profiling with, e.g., the kokkos-tools simple kernel timer, unambiguously identify the region of code being profiled. It also enables one to write convenient post-processing tools that can manipulate the output of the profiling tools. For example, here's a snippet of the output from a post-processed (with the new badly written python script included in this PR) profile with the simple kernel timer on a downstream code:
With the automatic naming conventions, the script has enough info to pull together all the regions/kernels in a function and give an easily digestible breakdown of which functions are important and where time is being spent within them. To make all that work, the downstream code has to adopt usage of the same macros and automatic naming.
I haven't thought through this, but I believe this model will also make it easy to instrument our code(s) in other ways, beyond the
Kokkos::Profiling
hooks.There are a lot of changed files in this PR, but the changes are almost entirely trivial. The only nontrivial changes relate to the fact that the objects these macros create need to be properly scoped, which meant some variable declarations had to be moved around so they were at function scope.
I haven't documented this stuff, but mostly because I don't think we have any documentation right now on the profiling stuff and I just don't have the time at the moment to create that from scratch. I'll get to it in another PR.
PR Checklist