-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iteratively build graph index #612
Iteratively build graph index #612
Conversation
This PR is related to #610 |
/ok to test |
cbf708b
to
bce7efa
Compare
/ok to test |
0100b9c
to
ce17775
Compare
Conflicts with branch-25.02 have been fixed. |
/ok to test |
/ok to test |
/ok to test |
@@ -104,7 +108,8 @@ struct index_params : cuvs::neighbors::index_params { | |||
*/ | |||
std::variant<std::monostate, | |||
graph_build_params::ivf_pq_params, | |||
graph_build_params::nn_descent_params> | |||
graph_build_params::nn_descent_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have explicit docs for each of these arguments. I understand the iterative search params are experimental to start, but can we at least add them to the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will add comments about iterative_search_params here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Akira for this PR, it is very nice to see this feature in cuvs. The PR looks great overall, I just have a few smaller suggestion.
@@ -72,6 +72,10 @@ struct ivf_pq_params { | |||
}; | |||
|
|||
using nn_descent_params = cuvs::neighbors::nn_descent::index_params; | |||
|
|||
// **** Experimental **** | |||
using iterative_search_params = cuvs::neighbors::search_params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, cuvs::neighbors::search_params
is being re-used here as an empty tag for the std::variant
index parameters below.
While this is a perfectly valid approach, I wonder whether it would look confusing to a user and whether it would cause us any troubles in future (if at some point we decide to add any members to cuvs::neighbors::search_params
).
An alternative would be to just replace it with struct iterative_search_params {}
.
What do you think, @cjnolet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the changes look very good and clean to me
// Determine initial graph size. | ||
uint64_t final_graph_size = (uint64_t)dataset.extent(0); | ||
uint64_t initial_graph_size = (final_graph_size + 1) / 2; | ||
while (initial_graph_size > graph_degree * 64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick, but perhaps we could add RAFT_EXPECTS(graph_degree > 0);
assertion at the top of this function to make sure it doesn't cause an infinite loop if invalid (zero) graph_degree is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Akira for the updates, the PR looks good to me.
/ok to test |
4 similar comments
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/merge |
This PR is about how CAGRA's search() and optimize() can be used to iteratively create and improve graph index. Currently, IVFPQ and NND are used to create the initial kNN graph, which is then optimized to create the CAGRA search graph. So, for example, if you want to support a new data type in CAGRA, you need to create an initial kNN graph with that data type, and IVFPQ or NND must also support that new data type. This is a bit of hassle. This PR is one solution to that problem. With functionality of this PR, once the CAGRA search supports the new data type, it can be used to create a graph index with it. Authors: - Akira Naruse (https://github.com/anaruse) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#612
This PR adds the binary Hamming distance support to CAGRA. dependency: #612 TODO: - [x] Add Hamming distance dist_op for CAGRA search - Add `DistanceType::BinaryHamming` (because `HammingUnexpanded` is not bitwise operation) - [x] Support graph build - Want to use the iterative graph build method that uses the CAGRA search for graph build (anaruse@5a80659) because otherwise the binary Hamming distance support for IVFPQ or NN Descent is additionally required. - [x] Add CI test - GT creation - Add test cases - [x] Test on some benchmark datasets - [x] Add `preprocessing::quantize::binary` Authors: - tsuki (https://github.com/enp1s0) - Akira Naruse (https://github.com/anaruse) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #610
This PR is about how CAGRA's search() and optimize() can be used to iteratively create and improve graph index.
Currently, IVFPQ and NND are used to create the initial kNN graph, which is then optimized to create the CAGRA search graph. So, for example, if you want to support a new data type in CAGRA, you need to create an initial kNN graph with that data type, and IVFPQ or NND must also support that new data type. This is a bit of hassle.
This PR is one solution to that problem. With functionality of this PR, once the CAGRA search supports the new data type, it can be used to create a graph index with it.