Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

CUB 0.9.3

Compare
Choose a tag to compare
@brycelelbach brycelelbach released this 19 May 08:23

Summary

CUB 0.9.3 adds histogram algorithms and work management utility descriptors.

New Features

  • cub::DevicHistogram256.
  • cub::BlockHistogram256.
  • cub::BlockScan algorithm variant BLOCK_SCAN_RAKING_MEMOIZE, which trades more register consumption for less shared memory I/O.
  • cub::GridQueue, cub::GridEvenShare, work management utility descriptors.

Other Enhancements

  • Updates to cub::BlockRadixRank to use cub::BlockScan, which improves performance on SM3x by using SHFL.
  • Allow types other than builtin types to be used in cub::WarpScan::*Sum methods if they only have operator+ overloaded. Previously they also required to support assignment from int(0).
  • Update cub::BlockReduce's BLOCK_REDUCE_WARP_REDUCTIONS algorithm to work even when block size is not an even multiple of warp size.
  • Refactoring of cub::DeviceAllocator interface and cub::CachingDeviceAllocator implementation.