-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce CAGRA test runtime #602
Reduce CAGRA test runtime #602
Conversation
@tfeher @achirkin, I'd appreciate your thoughts on the parameter reductions here. We don't want to lose anything that is important for coverage, and @cjnolet suggested you would be good reviewers to ensure we retain good coverage. If we can be even more aggressive in our parameter trimming, that is also useful input! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @bdice, for the PR! The long CAGRA test times has long been a problem and I totally agree we have to do something with it.
I have nothing against splitting the CAGRA test into multiple executables.
However, I'm a bit hesitating to reduce the test cases via the generate_inputs()
. Most of these were set during development for various (legit, but sometimes forgotten) reasons.
I think, instead of reducing the base set of tests, we can go couple other ways and have a greater impact on test times without significantly reducing the coverage:
- We can drastically reduce the number of cases in the
AnnCagraAddNodesTest
/AnnCagraFilterTest
, while keeping the baseAnnCagraTest
as is. For example, we can remove all variants changing the build algorithm and refine ratios for the two suites as these parameters has no effect on them. We can copy-paste thegenerate_inputs()
togenerate_inputs_limited()
and manually reduce the set by more than 50% of cases. Or we can simply filter out every second variant from the vector (knowing we've already tested the same combination in the base test). - A more involved refactoring (perhaps for a follow-up): I believe we can share some of the state between the tests rather than generating everything from scratch. E.g. generate a blob of data big enough only once and slice/reuse it for all tests in a suite. Or keep an index between the cases varying only search parameters.
cpp/test/neighbors/ann_cagra.cuh
Outdated
inputs2 = | ||
raft::util::itertools::product<AnnCagraInputs>({100}, | ||
{1000}, | ||
{1, 5, 8, 64, 137, 256, 619, 1024}, // dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we test a few edge cases where the dimensionality is 2ⁿ
or 2ⁿ ± 1
. These are important to check whether we have any problems with padding the data during build and search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that was the case. There were a few places that varied dim
over a wide range of values. I kept different dim
values in each test so that the full range would be covered. Perhaps we can cover the full range in only one set of inputs instead, and use a small range for the other cases?
cpp/test/neighbors/ann_cagra.cuh
Outdated
@@ -876,14 +876,15 @@ class AnnCagraFilterTest : public ::testing::TestWithParam<AnnCagraInputs> { | |||
inline std::vector<AnnCagraInputs> generate_inputs() | |||
{ | |||
// TODO(tfeher): test MULTI_CTA kernel with search_width > 1 to allow multiple CTA per queries | |||
// Varying dim, k, graph_build_algo, search_algo, max_queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the extra annotations on what are we varying. Thanks!
@achirkin Your comments make sense to me. Would you be willing to push those changes to this PR? |
@bdice Thank you for your proposal. I am preparing the changes, and I will push them soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the AddNode and Filtering tests on a subset of input makes a large difference.
I have also applied a refactoring on the basic parameter combinations. This also gave a smaller extra improvement. @achirkin please have a look, if you prefer to keep some of the combinations then let me know.
Testing on H100, this runs 3.7x faster than the previously.
std::vector<AnnCagraInputs> inputs = raft::util::itertools::product<AnnCagraInputs>( | ||
{100}, | ||
{1000}, | ||
{1, 8, 17}, | ||
{1, 16}, // k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dim and build algo combinations are tested below, therefor we focus on dim and search algo and max_query parameter value here.
auto inputs2 = raft::util::itertools::product<AnnCagraInputs>( | ||
{1, 100}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above we tested with different max_query
parameter, here we test with batch size 1 and 100.
{1, 3, 5, 7, 8, 17, 64, 128, 137, 192, 256, 512, 619, 1024}, // dim | ||
{10}, | ||
{10000}, | ||
{192, 1024}, // dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing AddNode, I assumed it is fine to limit to a smaller set of dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tfeher Thanks so much! Your changes look fine to me! It looks like ann_cagra/test_half_uint32_t.cu
might have been missed in the refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please apply these changes to ann_cagra/test_half_uint32_t.cu
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nevermind! I see there are no AddNode / Filtering tests in that file. My apologies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @tfeher, for the updates! Nice to see such a big speedup. Did you have a chance to check the difference in terms of the total number of test cases out of curiosity?
I'd personally prefer to keep the original generate_inputs()
unchanged and see how other improvements reduce the test time - just to be on the safe side. But the changes look reasonable, so I don't see a point in holding up this PR.
@tfeher @achirkin I merged in the upstream to resolve merge conflicts from #595. I tried to reflect that PR's intent (adding a bunch of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
Currently, running `NEIGHBORS_ANN_CAGRA_TEST` takes: [0.96 hours on CUDA 11.8, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418022?pr=596#step:8:1718) [1.59 hours on CUDA 12.5, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418329?pr=596#step:8:492) [0.28 hours on CUDA 12.0, A100 (ARM)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418741?pr=596#step:8:1729) Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes. This PR proposes some changes to CAGRA tests: - Each CAGRA type is now its own test executable (e.g. `NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST`) - Some parameter combinations were trimmed by ~50% Authors: - Bradley Dice (https://github.com/bdice) - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) URL: rapidsai#602
Currently, running
NEIGHBORS_ANN_CAGRA_TEST
takes:0.96 hours on CUDA 11.8, V100 (x86)
1.59 hours on CUDA 12.5, V100 (x86)
0.28 hours on CUDA 12.0, A100 (ARM)
Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes.
This PR proposes some changes to CAGRA tests:
NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
)