Skip to content

Commit

Permalink
Merge branch 'branch-24.02' into fea-bf_ser
Browse files Browse the repository at this point in the history
  • Loading branch information
wphicks authored Dec 6, 2023
2 parents e975ce2 + b333b74 commit 820f26a
Show file tree
Hide file tree
Showing 4 changed files with 124 additions and 71 deletions.
87 changes: 87 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,90 @@
# raft 23.12.00 (6 Dec 2023)

## 🐛 Bug Fixes

- Update actions/labeler to v4 ([#2037](https://github.com/rapidsai/raft/pull/2037)) [@raydouglass](https://github.com/raydouglass)
- pylibraft only depends on numpy at runtime, not build time. ([#2013](https://github.com/rapidsai/raft/pull/2013)) [@bdice](https://github.com/bdice)
- Fixes to update-version.sh ([#1991](https://github.com/rapidsai/raft/pull/1991)) [@raydouglass](https://github.com/raydouglass)
- Adjusting end-to-end start time so it doesn't include stream creation time ([#1989](https://github.com/rapidsai/raft/pull/1989)) [@cjnolet](https://github.com/cjnolet)
- CAGRA graph optimizer: clamp rev_graph_count ([#1987](https://github.com/rapidsai/raft/pull/1987)) [@tfeher](https://github.com/tfeher)
- Catching conversion errors in data_export instead of fully failing ([#1979](https://github.com/rapidsai/raft/pull/1979)) [@cjnolet](https://github.com/cjnolet)
- Fix syncing mechanism in `raft-ann-bench` C++ search ([#1961](https://github.com/rapidsai/raft/pull/1961)) [@divyegala](https://github.com/divyegala)
- Fixing hnswlib in latency mode ([#1959](https://github.com/rapidsai/raft/pull/1959)) [@cjnolet](https://github.com/cjnolet)
- Fix `ucx-py` alpha version update for `raft-dask` ([#1953](https://github.com/rapidsai/raft/pull/1953)) [@divyegala](https://github.com/divyegala)
- Reduce NN Descent test threshold ([#1946](https://github.com/rapidsai/raft/pull/1946)) [@divyegala](https://github.com/divyegala)
- Fixes to new YAML config `raft-bench-ann` ([#1945](https://github.com/rapidsai/raft/pull/1945)) [@divyegala](https://github.com/divyegala)
- Set RNG seeds in NN Descent to diagnose flaky tests ([#1931](https://github.com/rapidsai/raft/pull/1931)) [@divyegala](https://github.com/divyegala)
- Fix FAISS CPU algorithm names in `raft-ann-bench` ([#1916](https://github.com/rapidsai/raft/pull/1916)) [@divyegala](https://github.com/divyegala)
- Increase iterations in NN Descent tests to avoid flakiness ([#1915](https://github.com/rapidsai/raft/pull/1915)) [@divyegala](https://github.com/divyegala)
- Fix filepath in `raft-ann-bench/split_groundtruth` module ([#1911](https://github.com/rapidsai/raft/pull/1911)) [@divyegala](https://github.com/divyegala)
- Remove dynamic entry-points from raft-ann-bench ([#1910](https://github.com/rapidsai/raft/pull/1910)) [@benfred](https://github.com/benfred)
- Remove unnecessary dataset path check in ANN bench ([#1908](https://github.com/rapidsai/raft/pull/1908)) [@tfeher](https://github.com/tfeher)
- Fixing Googletests and re-enabling in CI ([#1904](https://github.com/rapidsai/raft/pull/1904)) [@cjnolet](https://github.com/cjnolet)
- Fix NN Descent overflows ([#1875](https://github.com/rapidsai/raft/pull/1875)) [@divyegala](https://github.com/divyegala)
- Build fix for CUDA 12.2 ([#1870](https://github.com/rapidsai/raft/pull/1870)) [@benfred](https://github.com/benfred)
- [BUG] Fix a bug in NN descent ([#1869](https://github.com/rapidsai/raft/pull/1869)) [@enp1s0](https://github.com/enp1s0)

## 📖 Documentation

- Brute Force Index documentation fix ([#1944](https://github.com/rapidsai/raft/pull/1944)) [@lowener](https://github.com/lowener)
- Add `wiki_all` dataset config and documentation. ([#1918](https://github.com/rapidsai/raft/pull/1918)) [@cjnolet](https://github.com/cjnolet)
- Updates to raft-ann-bench docs ([#1905](https://github.com/rapidsai/raft/pull/1905)) [@cjnolet](https://github.com/cjnolet)
- End-to-end vector search tutorial in docs ([#1776](https://github.com/rapidsai/raft/pull/1776)) [@cjnolet](https://github.com/cjnolet)

## 🚀 New Features

- Adding `dry-run` option to `raft-ann-bench` ([#1970](https://github.com/rapidsai/raft/pull/1970)) [@cjnolet](https://github.com/cjnolet)
- Add ANN bench scripts to generate ground truth ([#1967](https://github.com/rapidsai/raft/pull/1967)) [@tfeher](https://github.com/tfeher)
- CAGRA build + HNSW search ([#1956](https://github.com/rapidsai/raft/pull/1956)) [@divyegala](https://github.com/divyegala)
- Verify conda-cpp-post-build-checks ([#1935](https://github.com/rapidsai/raft/pull/1935)) [@robertmaynard](https://github.com/robertmaynard)
- Make all cuda kernels have hidden visibility ([#1898](https://github.com/rapidsai/raft/pull/1898)) [@robertmaynard](https://github.com/robertmaynard)
- Update rapids-cmake functions to non-deprecated signatures ([#1884](https://github.com/rapidsai/raft/pull/1884)) [@robertmaynard](https://github.com/robertmaynard)
- [FEA] Helpers for identifying contiguous layouts. ([#1861](https://github.com/rapidsai/raft/pull/1861)) [@trivialfis](https://github.com/trivialfis)
- Add `raft::stats::neighborhood_recall` ([#1860](https://github.com/rapidsai/raft/pull/1860)) [@divyegala](https://github.com/divyegala)
- [FEA] Helpers and CodePacker for IVF-PQ ([#1826](https://github.com/rapidsai/raft/pull/1826)) [@tarang-jain](https://github.com/tarang-jain)

## 🛠️ Improvements

- Pinning fmt and spdlog for raft-ann-bench-cpu ([#2018](https://github.com/rapidsai/raft/pull/2018)) [@cjnolet](https://github.com/cjnolet)
- Build concurrency for nightly and merge triggers ([#2011](https://github.com/rapidsai/raft/pull/2011)) [@bdice](https://github.com/bdice)
- Using `EXPORT_SET` in `rapids_find_package_root` ([#2006](https://github.com/rapidsai/raft/pull/2006)) [@cjnolet](https://github.com/cjnolet)
- Remove static checks for serialization size ([#1997](https://github.com/rapidsai/raft/pull/1997)) [@cjnolet](https://github.com/cjnolet)
- Skipping bad json parse ([#1990](https://github.com/rapidsai/raft/pull/1990)) [@cjnolet](https://github.com/cjnolet)
- Update select-k heuristic ([#1985](https://github.com/rapidsai/raft/pull/1985)) [@benfred](https://github.com/benfred)
- ANN bench: use different offset for each thread ([#1981](https://github.com/rapidsai/raft/pull/1981)) [@tfeher](https://github.com/tfeher)
- Allow `raft-ann-bench/run` to continue after encountering bad YAML configs ([#1980](https://github.com/rapidsai/raft/pull/1980)) [@divyegala](https://github.com/divyegala)
- Add build and search params to `raft-ann-bench.data_export` CSVs ([#1971](https://github.com/rapidsai/raft/pull/1971)) [@divyegala](https://github.com/divyegala)
- Use new `rapids-dask-dependency` metapackage for managing dask versions ([#1968](https://github.com/rapidsai/raft/pull/1968)) [@galipremsagar](https://github.com/galipremsagar)
- Remove unused header ([#1960](https://github.com/rapidsai/raft/pull/1960)) [@wphicks](https://github.com/wphicks)
- Adding pool back in and fixing cagra benchmark params ([#1951](https://github.com/rapidsai/raft/pull/1951)) [@cjnolet](https://github.com/cjnolet)
- Add constraints to `hnswlib` in `raft-bench-ann` ([#1949](https://github.com/rapidsai/raft/pull/1949)) [@divyegala](https://github.com/divyegala)
- Add support for iterating over batches in bfknn ([#1947](https://github.com/rapidsai/raft/pull/1947)) [@benfred](https://github.com/benfred)
- Fix ANN bench latency ([#1940](https://github.com/rapidsai/raft/pull/1940)) [@tfeher](https://github.com/tfeher)
- Add YAML config files to run parameter sweeps for ANN benchmarks ([#1929](https://github.com/rapidsai/raft/pull/1929)) [@divyegala](https://github.com/divyegala)
- Relax ucx pinning ([#1927](https://github.com/rapidsai/raft/pull/1927)) [@vyasr](https://github.com/vyasr)
- Try using contiguous rank to fix cuda_visible_devices ([#1926](https://github.com/rapidsai/raft/pull/1926)) [@VibhuJawa](https://github.com/VibhuJawa)
- Unpin `dask` and `distributed` for `23.12` development ([#1925](https://github.com/rapidsai/raft/pull/1925)) [@galipremsagar](https://github.com/galipremsagar)
- Adding `throughput` and `latency` modes to `raft-ann-bench` ([#1920](https://github.com/rapidsai/raft/pull/1920)) [@cjnolet](https://github.com/cjnolet)
- Providing `aarch64` yaml environment files ([#1914](https://github.com/rapidsai/raft/pull/1914)) [@cjnolet](https://github.com/cjnolet)
- CAGRA ANN bench: parse build options for IVF-PQ build algo ([#1912](https://github.com/rapidsai/raft/pull/1912)) [@tfeher](https://github.com/tfeher)
- Fix python script location in ANN bench description ([#1906](https://github.com/rapidsai/raft/pull/1906)) [@tfeher](https://github.com/tfeher)
- Refactor install/build guide. ([#1899](https://github.com/rapidsai/raft/pull/1899)) [@cjnolet](https://github.com/cjnolet)
- Check return values of raft-ann-bench subprocess calls ([#1897](https://github.com/rapidsai/raft/pull/1897)) [@benfred](https://github.com/benfred)
- ANN bench options to specify CAGRA graph and dataset locations ([#1896](https://github.com/rapidsai/raft/pull/1896)) [@cjnolet](https://github.com/cjnolet)
- Add check-json to pre-commit linters, and fix invalid ann-bench JSON config ([#1894](https://github.com/rapidsai/raft/pull/1894)) [@benfred](https://github.com/benfred)
- Use branch-23.12 workflows. ([#1886](https://github.com/rapidsai/raft/pull/1886)) [@bdice](https://github.com/bdice)
- Setup Consistent Nightly Versions for Pip and Conda ([#1880](https://github.com/rapidsai/raft/pull/1880)) [@divyegala](https://github.com/divyegala)
- Fix and improve one-block radix select ([#1878](https://github.com/rapidsai/raft/pull/1878)) [@yong-wang](https://github.com/yong-wang)
- [FEA] Improvements on bitset class ([#1877](https://github.com/rapidsai/raft/pull/1877)) [@lowener](https://github.com/lowener)
- Branch 23.12 merge 23.10 ([#1873](https://github.com/rapidsai/raft/pull/1873)) [@AyodeAwe](https://github.com/AyodeAwe)
- Branch 23.12 merge 23.10 ([#1868](https://github.com/rapidsai/raft/pull/1868)) [@cjnolet](https://github.com/cjnolet)
- Replace `raft::random` calls to not use deprecated API ([#1867](https://github.com/rapidsai/raft/pull/1867)) [@lowener](https://github.com/lowener)
- raft: Build CUDA 12.0 ARM conda packages. ([#1853](https://github.com/rapidsai/raft/pull/1853)) [@bdice](https://github.com/bdice)
- Documentation for raft ANN benchmark containers. ([#1833](https://github.com/rapidsai/raft/pull/1833)) [@dantegd](https://github.com/dantegd)
- [FEA] Support vector deletion in ANN IVF ([#1831](https://github.com/rapidsai/raft/pull/1831)) [@lowener](https://github.com/lowener)
- Provide a raft::copy overload for mdspan-to-mdspan copies ([#1818](https://github.com/rapidsai/raft/pull/1818)) [@wphicks](https://github.com/wphicks)
- Adding FAISS cpu to `raft-ann-bench` ([#1814](https://github.com/rapidsai/raft/pull/1814)) [@cjnolet](https://github.com/cjnolet)

# raft 23.10.00 (11 Oct 2023)

## 🚨 Breaking Changes
Expand Down
90 changes: 28 additions & 62 deletions cpp/include/raft/core/device_resources_manager.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -254,12 +254,6 @@ struct device_resources_manager {
// Container for underlying device resources to be re-used across host
// threads for each device
std::vector<resource_components> per_device_components_;
// Container for device_resources objects shared among threads. The index
// of the outer vector is the thread id of the thread requesting resources
// modulo the total number of resources managed by this object. The inner
// vector contains all resources associated with that id across devices
// in any order.
std::vector<std::vector<raft::device_resources>> resources_{};

// Return a lock for accessing shared data
[[nodiscard]] auto get_lock() const { return std::unique_lock{manager_mutex_}; }
Expand All @@ -271,72 +265,44 @@ struct device_resources_manager {
// all host threads.
auto const& get_device_resources_(int device_id)
{
// Each thread maintains an independent list of devices it has
// accessed. If it has not marked a device as initialized, it
// acquires a lock to initialize it exactly once. This means that each
// thread will lock once for a particular device and not proceed until
// some thread has actually generated the corresponding device
// components
thread_local auto initialized_devices = std::vector<int>{};
auto res_iter = decltype(std::end(resources_[0])){};
if (std::find(std::begin(initialized_devices), std::end(initialized_devices), device_id) ==
std::end(initialized_devices)) {
thread_local auto thread_resources = std::vector<std::optional<raft::device_resources>>([]() {
auto result = 0;
RAFT_CUDA_TRY(cudaGetDeviceCount(&result));
RAFT_EXPECTS(result != 0, "No CUDA devices found");
return result;
}());
if (!thread_resources[device_id]) {
// Only lock if we have not previously accessed this device on this
// thread
auto lock = get_lock();
initialized_devices.push_back(device_id);
// If we are building components, do not allow any further changes to
// resource parameters.
params_finalized_ = true;

if (resources_.empty()) {
// We will potentially need as many device_resources objects as there are combinations of
// streams and pools on a given device.
resources_.resize(std::max(params_.stream_count.value_or(1), std::size_t{1}) *
std::max(params_.pool_count, std::size_t{1}));
}

auto res_idx = get_thread_id() % resources_.size();
// Check to see if we have constructed device_resources for the
// requested device at the index assigned to this thread
res_iter = std::find_if(std::begin(resources_[res_idx]),
std::end(resources_[res_idx]),
[device_id](auto&& res) { return res.get_device() == device_id; });
// Even if we have not yet built device_resources for the current
// device, we may have already built the underlying components, since
// multiple device_resources may point to the same components.
auto component_iter = std::find_if(
std::begin(per_device_components_),
std::end(per_device_components_),
[device_id](auto&& components) { return components.get_device_id() == device_id; });

if (res_iter == std::end(resources_[res_idx])) {
// Even if we have not yet built device_resources for the current
// device, we may have already built the underlying components, since
// multiple device_resources may point to the same components.
auto component_iter = std::find_if(
std::begin(per_device_components_),
std::end(per_device_components_),
[device_id](auto&& components) { return components.get_device_id() == device_id; });
if (component_iter == std::end(per_device_components_)) {
// Build components for this device if we have not yet done so on
// another thread
per_device_components_.emplace_back(device_id, params_);
component_iter = std::prev(std::end(per_device_components_));
}
auto scoped_device = device_setter(device_id);
// Build the device_resources object for this thread out of shared
// components
resources_[res_idx].emplace_back(component_iter->get_stream(),
component_iter->get_pool(),
component_iter->get_workspace_memory_resource(),
component_iter->get_workspace_allocation_limit());
res_iter = std::prev(std::end(resources_[res_idx]));
if (component_iter == std::end(per_device_components_)) {
// Build components for this device if we have not yet done so on
// another thread
per_device_components_.emplace_back(device_id, params_);
component_iter = std::prev(std::end(per_device_components_));
}
} else {
auto res_idx = get_thread_id() % resources_.size();
// If we have previously accessed this device on this thread, we do not
// need to lock. We know that this thread already initialized the
// resources it requires for this device if no other thread had already done so, so we simply
// retrieve the previously-generated resources.
res_iter = std::find_if(std::begin(resources_[res_idx]),
std::end(resources_[res_idx]),
[device_id](auto&& res) { return res.get_device() == device_id; });
auto scoped_device = device_setter(device_id);
// Build the device_resources object for this thread out of shared
// components
thread_resources[device_id].emplace(component_iter->get_stream(),
component_iter->get_pool(),
component_iter->get_workspace_memory_resource(),
component_iter->get_workspace_allocation_limit());
}
return *res_iter;

return thread_resources[device_id].value();
}

// Thread-safe setter for the number of streams
Expand Down
Loading

0 comments on commit 820f26a

Please sign in to comment.