Write-combined CUDA host allocations may yield speedup #775

stephenswat · 2024-11-18T12:40:59Z

Writing this down in case any students want a fun and relatively self-contained project on traccc.

CUDA allows users to allocate pinned host memory in such a way that allows write-combining; this makes it faster for the host to write to this memory, and also makes it faster to transfer that memory to devices because there need not be as many cache coherency checks on the transfer. The trade-off here is that is makes reads on the host extremely slow.

Documentation on this topic is available at https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1gb65da58f444e7230d3322b6126bb4902.

The following is a plan for how to implement this:

Implement a write-combined memory resource, or add an option to the existing pinned memory resource.
Use the new memory resource for big allocations which are written to by the host, but which are not read from by the host.

Note that traccc is not currently transfer-bound, but it could still be interesting to try.

stephenswat added cuda Changes related to CUDA good first issue Good for newcomers help wanted Extra attention is needed labels Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write-combined CUDA host allocations may yield speedup #775

Write-combined CUDA host allocations may yield speedup #775

stephenswat commented Nov 18, 2024

Write-combined CUDA host allocations may yield speedup #775

Write-combined CUDA host allocations may yield speedup #775

Comments

stephenswat commented Nov 18, 2024