Driver bug triggered in ZeroInitializeWorkgroupMemory #4591

raphlinus · 2023-10-27T18:01:42Z

Description
Running Vello on an AMD 5700 XT triggers a shader miscompilation at the driver level, which causes incorrect behavior (resulting in device lost).

Repro steps

git clone -b oh_eighteen https://github.com/DJMcNab/vello.git
cd vello
cargo run -p with_winit

Note: this is PR #398 of linebender/vello. The same thing happens on main, but this branch brings us to wgpu 0.18, and I figured it would be more helpful to work on the most recent versions.

Expected vs observed behavior
The example typically displays a couple frames, sometimes correctly and sometimes corrupted, then exits with a device lost error. Expected behavior is to display a tiger test image and performance statistics.

Extra materials
We tracked this down to a very buggy implementation of ZeroInitializeWorkgroupMemory in the AMD driver; the core problem is that it's zeroing the workgroup-shared memory and then proceeding to user code without a barrier. A secondary problem is that it's doing so extremely inefficiently; it appears all threads are zeroing the entire array.

One of the offending shaders is draw_reduce. The post-processed WGSL is attached, as is the SPIR-V output. Note that the spv does not contain any zeroing logic, as spv::ZeroInitializeWorkgroupMemoryMode::Native was selected in adapter.rs.

I captured the ISA using Radeon Developer Panel, doing ctrl-A, ctrl-C (and choosing inputs so it would run without crashing so I could capture a trace). Maybe there's a better way to do it, if so please let me know. In any case, three things are wrong:

There is no s_barrier between the zeroing logic and the user code
It appears that all invocations in the workgroup zero the entire array. If this were at the SPIR-V level, the conflicting writes would be considered a data race and thus UB, but maybe at the ISA level the behavior is defined. But this is certainly a performance problem if nothing else.
Speaking of performance problems, almost a thousand lines of ISA to zero an array is clearly not a good idea. The code is just bad, among other things repeatedly zeroes v[4:7] using the v_lshlrev_b64 instruction.

It makes sense to work around the broken driver by disabling ZeroInitializeWorkgroupMemoryMode::Native and also escalate the bug to AMD.

amd_bug_files.zip

Platform
Windows 10. AMD Radeon 5700 XT running driver 2.0.233, API version 1.3.217. This is running in Vulkan through the PRIMARY default. With DX12 selected, the example runs but with pathologically slow shader compile times.

The text was updated successfully, but these errors were encountered:

cwfitzgerald added external: driver-bug A driver is causing the bug, though we may still want to work around it backend: vulkan Issues with Vulkan labels Oct 27, 2023

raphlinus mentioned this issue Oct 27, 2023

Speed up zero initialization of workgroup memory #4592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Driver bug triggered in ZeroInitializeWorkgroupMemory #4591

Driver bug triggered in ZeroInitializeWorkgroupMemory #4591

raphlinus commented Oct 27, 2023 •

edited

Loading

Driver bug triggered in ZeroInitializeWorkgroupMemory #4591

Driver bug triggered in ZeroInitializeWorkgroupMemory #4591

Comments

raphlinus commented Oct 27, 2023 • edited Loading

raphlinus commented Oct 27, 2023 •

edited

Loading