Consider using RMM for all device allocations instead of `cudaMalloc` #3

cjnolet · 2021-09-27T16:49:33Z

While browsing briefly through the new PCA implementation, I noticed there are still several places where GPU memory is being allocated using cudaMalloc. Throughout this file, for example. In RAPIDS, we replace all calls to cudaMalloc in our code and use rmm to allocate any and all memory.

There are a few reasons why this is important

A user can set a single pool allocation size and apply it to a single device.
Allocations will be guaranteed to be aligned to 256 byte boundaries.
Every call directly to cudaMalloc imposes a device-wide synchronization. This can be avoided for the entire device when a pool allocator is configured.
Asynchronous streams which are tied to memory allocations can be maintained along with those allocations.
All allocations on each device will be guaranteed to use the same managed memory pool. This can be the source of problems when some allocations are done through RMM and others are done through cudaMalloc

Further, RMM provides an RAII C++ API that makes managing these pointers easier and less prone to memory leaks. By just using the rmm::device_uvector and smart pointers instead of cudaMalloc, the algorithms in this repository will automatically benefit from the items listed above.

The text was updated successfully, but these errors were encountered:

wjxiz1992 · 2021-09-28T01:56:11Z

Thanks for pointing this out Corey! This has been added to the TODO plan. I've been working on a virtual review for the recent release, I will update once I finished the current work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using RMM for all device allocations instead of `cudaMalloc` #3

Consider using RMM for all device allocations instead of `cudaMalloc` #3

cjnolet commented Sep 27, 2021

wjxiz1992 commented Sep 28, 2021

Consider using RMM for all device allocations instead of cudaMalloc #3

Consider using RMM for all device allocations instead of cudaMalloc #3

Comments

cjnolet commented Sep 27, 2021

wjxiz1992 commented Sep 28, 2021

Consider using RMM for all device allocations instead of `cudaMalloc` #3

Consider using RMM for all device allocations instead of `cudaMalloc` #3