You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When any HDF5 outputs are enabled in KHARMA (i.e. dumps and/or restarts), and it is run with multiple MPI ranks, it can hang at the first file write (just after printing Running X driver...).
This appears to be an issue with Parthenon, possibly introduced by recent MPI changes for new features. Parthenon PR for a possible fix is here: parthenon-hpc-lab/parthenon#979
Disabling all HDF5 outputs avoids this issue, and as it "only" happens ~80% of the time it can sometimes be circumvented just by repeatedly restarting the same run. Disabling SMR (either not refining or using AMR) generally also fixes the issue or at least makes the race condition much less frequent. Might also be an issue of the number of blocks/blocks per rank, though.
The text was updated successfully, but these errors were encountered:
When any HDF5 outputs are enabled in KHARMA (i.e. dumps and/or restarts), and it is run with multiple MPI ranks, it can hang at the first file write (just after printing
Running X driver...
).This appears to be an issue with Parthenon, possibly introduced by recent MPI changes for new features. Parthenon PR for a possible fix is here: parthenon-hpc-lab/parthenon#979
Disabling all HDF5 outputs avoids this issue, and as it "only" happens ~80% of the time it can sometimes be circumvented just by repeatedly restarting the same run. Disabling SMR (either not refining or using AMR) generally also fixes the issue or at least makes the race condition much less frequent. Might also be an issue of the number of blocks/blocks per rank, though.
The text was updated successfully, but these errors were encountered: