From 2ecafcff40132f56eb2b494e1a374684ff97117a Mon Sep 17 00:00:00 2001 From: "Michael E. Rowan" Date: Thu, 22 Feb 2024 18:25:18 -0800 Subject: [PATCH] assert wavefront size (#3777) ## Summary Add assert checking wavefront size for HIP build. It can hopefully alert to problems and fail loudly for users trying to run on Navi cards, rather than produce incorrect simulation results (until the underlying issue is fixed). I opened an internal ticket that I hope will get the underlying issue fixed soon. ## Additional background AMReX users reported to AMD incorrect results with Navi cards. I think there is a bug with the compiler's definition of `__AMDGCN_WAVEFRONT_SIZE` when compiling for target gfx10XX, and find it disagrees with `warpSize` that one can check at runtime. For example, compiling for target gfx1030 and running on Navi card (with an up-to-date ROCm): ``` #include int main () { hipInit(0); int device_id; hipGetDevice(&device_id); hipDeviceProp_t device_prop; hipGetDeviceProperties(&device_prop, device_id); printf("warpSize = %d --- __AMDGCN_WAVEFRONT_SIZE = %d \n", device_prop.warpSize, __AMDGCN_WAVEFRONT_SIZE); return 0; } ``` For Navi card, I got output `warpSize = 32 --- __AMDGCN_WAVEFRONT_SIZE = 64`, but of course expected 32 for both. Strangely, in an actual application compiled for target gfx1030, all the kernel metadata says the wavefront size is 32 (even with `__AMDGCN_WAVEFRONT_SIZE` apparently 64), so when manually overriding the wavefront size to 32 in AMReX built for target gfx1030, the simulation results seemed convergent to gfx9XX results. So maybe that wrong value is not polluting the kernel compilation. ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate --- Src/Base/AMReX_GpuDevice.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Src/Base/AMReX_GpuDevice.cpp b/Src/Base/AMReX_GpuDevice.cpp index df3625d13ce..2bdd6c87363 100644 --- a/Src/Base/AMReX_GpuDevice.cpp +++ b/Src/Base/AMReX_GpuDevice.cpp @@ -354,6 +354,8 @@ Device::initialize_gpu () AMREX_HIP_SAFE_CALL(hipGetDeviceProperties(&device_prop, device_id)); + AMREX_ALWAYS_ASSERT_WITH_MESSAGE(warp_size == device_prop.warpSize, "Incorrect warp size"); + // check compute capability // AMD devices do not support shared cache banking.