Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Add Support for Index merge in CAGRA #618

Open
wants to merge 9 commits into
base: branch-25.02
Choose a base branch
from

Conversation

rhdong
Copy link
Member

@rhdong rhdong commented Jan 27, 2025

No description provided.

@rhdong rhdong requested review from cjnolet and achirkin January 27, 2025 07:04
@rhdong rhdong requested review from a team as code owners January 27, 2025 07:04
auto merged_index =
cagra::build(handle, params, raft::make_const_mdspan(device_updated_dataset_view));

if (static_cast<std::size_t>(stride) == dim) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cjnolet @achirkin , I know these codes are odd, but without them, datasets will be changed after calling cagra::detail::search_main_core, which will cause the test failure. I do not know how the dataset format, matrix ownership, cagra::search interact behind it. Could you have comments here? Many thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it looks like no one owns the host_updated_dataset or device_updated_dataset beyond the scope of this function, so the data gets destroyed unless the owning update_dataset is called under the if branch here.
Hence, I think, you should call update_dataset unconditionally here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it looks like no one owns the host_updated_dataset or device_updated_dataset beyond the scope of this function, so the data gets destroyed unless the owning update_dataset is called under the if branch here. Hence, I think, you should call update_dataset unconditionally here.

Thank you, very helpful!

Copy link
Member Author

@rhdong rhdong Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @achirkin , I got the reason; the update_dataset is called in cagra::build, actually. This issue I mentioned is because a non-owning dataset is returned while stride==dim; pls refer to here. So, I unintentionally did the right thing. 😅

@rhdong rhdong added feature request New feature or request non-breaking Introduces a non-breaking change labels Jan 27, 2025
}

// Allocate the new dataset on device
auto device_updated_dataset =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a great API that can be used to fit the situation that device memory is not enough cuvs::neighbors::nn_descent::has_enough_device_memory. I will make it in the next commit

*
* @return A new CAGRA index containing the merged indices, graph, and dataset.
*/
auto merge(raft::resources const& res,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chatman, I'm working on cagra::merge. Could you review the API design when you have a moment? Any suggestions would be greatly appreciated. Thanks!

@rhdong
Copy link
Member Author

rhdong commented Jan 30, 2025

Hi @achirkin @cjnolet , could you take a look at the API design? Any feedback is appreciated. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp feature request New feature or request non-breaking Introduces a non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants