Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
assert wavefront size (AMReX-Codes#3777)
## Summary Add assert checking wavefront size for HIP build. It can hopefully alert to problems and fail loudly for users trying to run on Navi cards, rather than produce incorrect simulation results (until the underlying issue is fixed). I opened an internal ticket that I hope will get the underlying issue fixed soon. ## Additional background AMReX users reported to AMD incorrect results with Navi cards. I think there is a bug with the compiler's definition of `__AMDGCN_WAVEFRONT_SIZE` when compiling for target gfx10XX, and find it disagrees with `warpSize` that one can check at runtime. For example, compiling for target gfx1030 and running on Navi card (with an up-to-date ROCm): ``` #include <hip/hip_runtime.h> int main () { hipInit(0); int device_id; hipGetDevice(&device_id); hipDeviceProp_t device_prop; hipGetDeviceProperties(&device_prop, device_id); printf("warpSize = %d --- __AMDGCN_WAVEFRONT_SIZE = %d \n", device_prop.warpSize, __AMDGCN_WAVEFRONT_SIZE); return 0; } ``` For Navi card, I got output `warpSize = 32 --- __AMDGCN_WAVEFRONT_SIZE = 64`, but of course expected 32 for both. Strangely, in an actual application compiled for target gfx1030, all the kernel metadata says the wavefront size is 32 (even with `__AMDGCN_WAVEFRONT_SIZE` apparently 64), so when manually overriding the wavefront size to 32 in AMReX built for target gfx1030, the simulation results seemed convergent to gfx9XX results. So maybe that wrong value is not polluting the kernel compilation. ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate
- Loading branch information