Write-combined CUDA host allocations may yield speedup #775
Labels
cuda
Changes related to CUDA
good first issue
Good for newcomers
help wanted
Extra attention is needed
Writing this down in case any students want a fun and relatively self-contained project on traccc.
CUDA allows users to allocate pinned host memory in such a way that allows write-combining; this makes it faster for the host to write to this memory, and also makes it faster to transfer that memory to devices because there need not be as many cache coherency checks on the transfer. The trade-off here is that is makes reads on the host extremely slow.
Documentation on this topic is available at https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1gb65da58f444e7230d3322b6126bb4902.
The following is a plan for how to implement this:
Note that traccc is not currently transfer-bound, but it could still be interesting to try.
The text was updated successfully, but these errors were encountered: