This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.6.3
Summary
CUB 1.6.3 improves support for Windows, changes cub::BlockLoad
/cub::BlockStore
interface to take the local data type, and enhances radix sort performance for SM6x (Pascal) GPUs.
Breaking Changes
cub::BlockLoad
andcub::BlockStore
are now templated by the local data type, instead of theIterator
type. This allows for output iterators havingvoid
as theirvalue_type
(e.g. discard iterators).
Other Enhancements
- Radix sort tuning policies updated for SM6x (Pascal) GPUs - 6.2B 4 byte keys/s on GP100.
- Improved support for Windows (warnings, alignment, etc).
Bug Fixes
- #74:
cub::WarpReduce
executes reduction operator for out-of-bounds items. - #72:
cub:InequalityWrapper::operator
should be non-const. - #71:
cub::KeyValuePair
won't work ifKey
has non-trivial constructor. - #69: cub::BlockStore::Store
doesn't compile if
OutputIteratorT::value_typeisn't
T`. - #68:
cub::TilePrefixCallbackOp::WarpReduce
doesn't permit PTX arch specialization.