-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for "analysis" restart #963
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this kind of capability is a long time coming! Nice to see it's getting some attention. I am very enthusiastic about the concept. A few (perhaps dumb) questions about this design:
- What are the intended hooks as far as doing analysis without launching into the driver? For example, could I not get the same capability implemented by simply writing two drivers in my downstream code, one for time evolution and one for post-processing? Should I think of this as hooks for the downstream code to do just that?
- Why does the mesh need to know anything about whether we're running analysis or not? This seems like something that should be handled only at the parthenon manager level?
This is why I consider this PEP2, so let's first discuss what the solution should look like ;)
My main intention with the current approach was to make sth available through Parthenon without any modification of downstream codes.
You're referring to the |
Ah I see. I don't think I understood that. I had the
I think I would prefer if this lived in the manager or the driver than in the mesh.
Yes with your model I think that makes sense. |
I tried turning on/off individual output blocks using the command line, e.g.:
but it seems to always output all of the output blocks: ➜ athenapk git:(forrestglines/parthenon-pr930) ✗ ls -lt | head
total 748596
-rw-r--r-- 1 benwibking 19239961 Dec 1 12:12 parthenon.restart.00005.rhdf
-rw-r--r-- 1 benwibking 21094 Dec 1 12:12 parthenon.restart.00005.rhdf.xdmf
-rw-r--r-- 1 benwibking 9887769 Dec 1 12:12 parthenon.prim.00050.phdf
-rw-r--r-- 1 benwibking 25954 Dec 1 12:12 parthenon.prim.00050.phdf.xdmf
-rw-r--r-- 1 benwibking 170 Dec 1 12:12 parthenon.hst
-rw-r--r-- 1 benwibking 19245585 Dec 1 12:01 parthenon.restart.final.rhdf
-rw-r--r-- 1 benwibking 21094 Dec 1 12:01 parthenon.restart.final.rhdf.xdmf
-rw-r--r-- 1 benwibking 12791828 Dec 1 12:01 parthenon.prim.final.phdf
-rw-r--r-- 1 benwibking 25954 Dec 1 12:01 parthenon.prim.final.phdf.xdmf This was run with the AthenaPK |
Does this also happen for a non "final" restart dump? |
For a non-"final" restart, it doesn't output at all, no matter what options I give it. |
I just tried with a fresh build and it works for here:
you can also add
Can you double check that your Parthenon submodule is up to date? |
Ah, yeah, it was out of date. It works now. |
This PR is the source of our hanging woes, @pgrete. The constructor isn't setting the This was causing some MPI ranks to take the analysis output codepath, and some not, which had 3 possible effects:
EDIT proposing changes wouldn't let me add the new line so I committed the change, feel free to revert/etc. The formatting issue is in an unrelated area... |
Apply a proposed fix for a hang introduced by analysis outputs
I'm not questioning that this solves the issue, I'm just trying to understand "why" because I'm concerned that the issue may be somewhere deeper. |
Hmm, yeah it looks like my downstream didn't have those initializers. They were lost when I merged the MG stuff, which also modifies both Mesh() constructors, and choosing Luke's changes overwrote the new constructors without me noticing (thought it was just formatting). I'm not sure if that says more about my lazy merging habits, or a code structure that encourages modifying a single central object for everything from output types to meshblock interaction changes... Either way, my change probably only fixes my case, then. Feel free to revert/merge how you'd like, I'll just pull the results again when everything hits I will say though that I like the structure of initializing the different options for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After pulling this and using it a couple times, this works nicely with either .rhdf
or .phdf
files. I think KHARMA needs some more features before we'll use this much, but it definitely works fine and (down to my bad merge) doesn't break anything.
I don't mind at all the implementation being in EvolutionDriver
, though I think we should just rename it DefaultDriver
or BaseDriver
at this point. So, approving.
However, I think there's a way of doing this that avoids adding another member to Mesh
and instead uses ParameterInput
to store the flag, potentially to the extent of being able to do analysis runs or restarts entirely with input files, e.g. ./downstream -i inputfile.par parthenon/job/do_analysis_only=true parthenon/job/restartfile=prob.outX.XXXXX.rhdf
. Beyond being nice for changing input parameters upon analysis/restart in a systematic way (and leaving a record of all parameters, not just whatever's in the file), this might be a starting pattern toward detangling the current Mesh
and Outputs
code so that not as many features rely on it directly.
I now updated the changes following your comments, specifically
I also decided to keep the Would be great if you @Yurlungur @bprather could double check if you're happy with those changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot cleaner and clearer now. I like it. 👍
Great! Yeah I wasn't (or, I don't think I was) advocating for removing the |
PR Summary
This is a quick and dirty implementation of "analysis" restarts, i.e., restarts that do not enter a driver and only dump a specified subset of outputs.
Restarting works with
-a FILE
and output are enabled to be dumped for an analysis restart by addinganalysis_output=true
to the output block.No other output is being written or data modified and the approach is also memory friendly as no extra memory is allocated (e.g., for different stages in a driver).
Still there's more room for improvement (e.g., by skipping flux allocations but that resulted in some weird errors when I briefly tested it).
I opened this PR in order to get feedback as this really is just a quick and dirty solution (one can consider this PEP2 ;) ).
It somehow feels wrong to start an
EvolutionDriver
and then skip "evolution" but this approach allows this to be used by downstream codes without any modification.What do people think?
PR Checklist