This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.5.2
Summary
CUB 1.5.2 enhances cub::CachingDeviceAllocator
and improves scan performance for SM5x (Maxwell).
Enhancements
- Improved medium-size scan performance on SM5x (Maxwell).
- Refactored
cub::CachingDeviceAllocator
:- Now spends less time locked.
- Uses C++11's
std::mutex
when available. - Failure to allocate a block from the runtime will retry once after freeing cached allocations.
- Now respects max-bin, fixing an issue where blocks in excess of max-bin were still being retained in the free cache.
Bug fixes:
- Fix for generic-type reduce-by-key
cub::WarpScan
for SM3x and newer GPUs.