-
Notifications
You must be signed in to change notification settings - Fork 37
2021.03.04 Meeting Notes
- Individual/group updates
- Scaling update
- IO (performance and default basename handling)
- Review non-WIP PRs
Mostly reviewing pull requests. Debugging CI issues.
Looking into dumping output for sparse variables
Figured out what was wrong with build on RZAnsel, something wrong with the .cshrc file on RZAnsel.
Mostly working on Physics side of things, but noted the importance of tiling when using MDRangePolicy. Significant performance degradation can occur if the wrong tiling is used. https://github.com/lanl/parthenon/issues/466
Been working on Physics codes. Table interpolation machinery, tabulated equation of state has been made open source.
Still working on particles pr.
- Found and fixed "invalid free" bug
- Various quality of life improvements
- "soft disable" outputs
- split init of Parthenon manager
- allow reuse of Parthenon testing framework in downstream codes
- Updated AthenaPK (following Jim's test code) to use two register integration and various reconstruction methods
- Gave talk at CSE21
- More scaling test (see later topic)
Will begin work in implementing static mesh in Parthenon.
Has been playing around with physics in two different test codes.
Galen now has performance numbers on Sierra matching what Phil was getting. Has submitted a request on Sierra with higher priority. Has scaled up the advection test to 16000 GPUs. Will also be testing on Trinity with just over 500,000 ranks, using single ranks per physical core.
There appears to be an issue when multiple ranks are assigned per gpu, this does not appear to improve the performance. Galen will test that configuration.
Phil - did weak scaling on uniform grid on summit, the results did not look as good as Athena, seems to be related to the current setup of the driver because currently don't get effective overlapping communication. Only used 64 cubic mesh blocks 500,012 GPUs, slow down was less than a factor of 3.
Will begin looking at restriction and prolongation approach.
Phil raised the question about how to specify the correct name of the outputs, the default is parthenon + the jobid which is not appropriate for downstream codes.
Another, issue is the performance of the io. Currently, restart files are making use of parallel HDF5 which is not as performance as MPI IO, some benchmarking should be done to see what the difference is.
Josh Brown will look into this.