Atomic contention in triplet counting kernel could be alleviated #818
Labels
good first issue
Good for newcomers
improvement
Improve an existing feature
performance
Performance-relevant changes
shared
Changes related to shared code
The triplet counting kernel is our third-hottest kernel in terms of throughput. However, this kernel introduces some atomic contention through the way it pushes data to the output array:
traccc/device/common/include/traccc/seeding/device/impl/count_triplets.ipp
Lines 105 to 115 in dee541f
Although modern GPGPU architectures do automatically coalesce atomic accesses to some extent, we might still benefit from coalescing the atomic addition on a block-scale first (using, e.g.,
barrier::blockCount
) and issuing only a single atomic increment per block.This relatively simple and well-contained issue should be very suitable for developers trying to get started with traccc or with GPGPU programming in general.
The text was updated successfully, but these errors were encountered: