This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.17.0
CUB 1.17.0
Summary
CUB 1.17.0 is the final minor release of the 1.X series. It provides a variety of bug fixes and miscellaneous enhancements, detailed below.
Known Issues
“Run-to-run” Determinism Broken
Several CUB device algorithms are documented to provide deterministic results (per device) for non-associative reduction operators (e.g. floating-point addition). Unfortunately, the implementations of these algorithms contain performance optimizations that violate this guarantee. The DeviceReduce::ReduceByKey
and DeviceScan
algorithms are known to be affected. We’re currently evaluating the scope and impact of correcting this in a future CUB release. See NVIDIA/cub#471 for details.
Bug Fixes
- #444: Fixed
DeviceSelect
to work with discard iterators and mixed input/output types. - #452: Fixed install issue when
CMAKE_INSTALL_LIBDIR
contained nested directories. Thanks to @robertmaynard for this contribution. - #462: Fixed bug that produced incorrect results from
DeviceSegmentedSort
on sm_61 and sm_70. - #464: Fixed
DeviceSelect::Flagged
so that flags are normalized to 0 or 1. - #468: Fixed overflow issues in
DeviceRadixSort
givennum_items
close to 2^32. Thanks to @canonizer for this contribution. - #498: Fixed compiler regression in
BlockAdjacentDifference
. Thanks to @MKKnorr for this contribution.
Other Enhancements
- #445: Remove device-sync in
DeviceSegmentedSort
when launched via CDP. - #449: Fixed invalid link in documentation. Thanks to @kshitij12345 for this contribution.
- #450:
BlockDiscontinuity
: Replaced recursive-template loop unrolling with#pragma unroll
. Thanks to @kshitij12345 for this contribution. - #451: Replaced the deprecated
TexRefInputIterator
implementation with an alias toTexObjInputIterator
. This fully removes all usages of the deprecated CUDA texture reference APIs from CUB. - #456:
BlockAdjacentDifference
: Replaced recursive-template loop unrolling with#pragma unroll
. Thanks to @kshitij12345 for this contribution. - #466:
cub::DeviceAdjacentDifference
API has been updated to use the newOffsetT
deduction approach described in #212. - #470: Fix several doxygen-related warnings. Thanks to @karthikeyann for this contribution.