Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading restarts due to hidden ghost var #1104

Merged
merged 2 commits into from
Jun 12, 2024

Conversation

pgrete
Copy link
Collaborator

@pgrete pgrete commented Jun 12, 2024

PR Summary

hasGhost was defined both in restart.hpp and restart_hdf5.hpp
We only set the value inside the constructor of the derived class but in the parthenon manager we work with the base class.
The latter actually set the data based on bounds defined by hasGhosts so it made a decision based on uninitialized data.
On Frontier I could reliably reproduce the issue (both with cray and amdclang/hip compilers), whereas on other machines/compilers hasGhosts was set by default/accident to 0 in the base class (though also rendering reading restarts with ghosts unusable in current develop).
Instead of just removing the duplicated declaration I decided to remove public variable entirely and go with a getter function.
I also get the data type to int for backwards compatibility and in case we eventually decide to do sth more with this rather than a bool check.

PS: With this (hopefully) last PR, we can finally restart/do some analysis with AthenaPK in sync with Parthenon develop again.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@pgrete pgrete added the bug Something isn't working label Jun 12, 2024
@pgrete pgrete requested review from Yurlungur and lroberts36 June 12, 2024 10:31
Comment on lines -248 to -249
PARTHENON_HDF5_CHECK(
H5Sselect_hyperslab(hdl.dataspace, H5S_SELECT_SET, offset, NULL, count, NULL));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lroberts36 found that this line is duplicated two lines below, so I cleaned it up (independent of the bugfix in this PR)

@BenWibking
Copy link
Collaborator

I don't understand how the CI passed with this bug...?

Also: would it make sense to add an action to run the CI with ASAN enabled to catch unitialized value bugs in general?

@pdmullen pdmullen self-requested a review June 12, 2024 15:19
Copy link
Collaborator

@pdmullen pdmullen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are a godsend @pgrete. I lost my day to this bug yesterday... I couldn't track it down for the life of me. Confirmed that this fixes a restart issue encountered with another parth downstream code.

Copy link
Collaborator

@lroberts36 lroberts36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice catch!

@lroberts36
Copy link
Collaborator

I don't understand how the CI passed with this bug...?

I think that hasGhost = 0 is the correct value for all of the restart tests, so if it was default initialized to zero then things would work even with the bug.

Also: would it make sense to add an action to run the CI with ASAN enabled to catch unitialized value bugs in general?

I agree that this would be a good idea.

@lroberts36 lroberts36 enabled auto-merge June 12, 2024 16:37
@lroberts36 lroberts36 merged commit ff02625 into develop Jun 12, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants