summing data across threads #294
-
Hi everyone, I'm working with Warp and need some advice on correctly accumulating values in a kernel. I have a vector that I pass to my kernel, which is used to sum calculations performed by each thread. However, I'm struggling with ensuring this summation is done correctly, as multiple threads may be trying to update the same array element simultaneously. Below is a simplified example of what I'm trying to do: import warp as wp
import numpy as np
import matplotlib.pylab as plt
@wp.kernel
def my_kernel(
theta: wp.array(dtype=wp.float32), # type: ignore
phi: wp.array(dtype=wp.float32), # type: ignore
A: wp.array(dtype=wp.float32), # type: ignore
):
i, j = wp.tid()
A[i] += wp.cos(theta[i]) + wp.sin(phi[j])
def sample_code():
theta = wp.array(np.linspace(0, np.pi, 100).astype(np.float32))
phi = wp.array(np.linspace(np.pi, 2 * np.pi, 100).astype(np.float32))
_A = wp.zeros((100,))
with wp.ScopedDevice('cuda'):
wp.launch(
kernel=my_kernel,
dim=(100, 100),
inputs=[
theta,
phi,
_A
]
)
A = _A.numpy()
return A
a1 = sample_code()
a2 = sample_code()
plt.plot(a1)
plt.plot(a2)
plt.show() The issue is that when multiple threads try to update the same element in A, it leads to incorrect results. I suspect this is because one thread may be reading and writing to the array element while another thread is updating it, leading to a race condition. I initially thought that atomic_add might be the solution, but it doesn't accept A[i] as an argument. I’m aware that I could add an additional dimension to A, run the kernel so each thread only addresses one element, and then sum up the results after the kernel finishes. However, I’d prefer to avoid this as it significantly increases memory usage, which is problematic in my actual code (though not in the above example). Does anyone have any suggestions on how to handle this accumulation correctly without the memory overhead? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Well this is gonna be funny/sad... wp.atomic_add(A, i, wp.cos(theta[i]) + wp.sin(phi[j])) I'm going to leave the post up just in case there's other dummies like me out there. |
Beta Was this translation helpful? Give feedback.
Well this is gonna be funny/sad...
I had chatgpt4o clean up the wording of my post before submitting it. Afterwords I thought what the hell, lets see if chatgpt can help with the actual problem. It told me to use
atomic_add
. It turns out I didn't read theatomic_add
documentation correctly 🤦. So for my sample code you would just doI'm going to leave the post up just in case there's other dummies like me out there.