Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CAGRA test runtime #602

Merged
merged 7 commits into from
Jan 27, 2025

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Jan 22, 2025

Currently, running NEIGHBORS_ANN_CAGRA_TEST takes:
0.96 hours on CUDA 11.8, V100 (x86)
1.59 hours on CUDA 12.5, V100 (x86)
0.28 hours on CUDA 12.0, A100 (ARM)

Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes.

This PR proposes some changes to CAGRA tests:

  • Each CAGRA type is now its own test executable (e.g. NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST)
  • Some parameter combinations were trimmed by ~50%

@bdice bdice requested review from a team as code owners January 22, 2025 21:55
@bdice bdice added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 22, 2025
@bdice bdice requested review from tfeher and achirkin January 23, 2025 00:05
@bdice
Copy link
Contributor Author

bdice commented Jan 23, 2025

@tfeher @achirkin, I'd appreciate your thoughts on the parameter reductions here. We don't want to lose anything that is important for coverage, and @cjnolet suggested you would be good reviewers to ensure we retain good coverage.

If we can be even more aggressive in our parameter trimming, that is also useful input!

@bdice bdice mentioned this pull request Jan 23, 2025
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @bdice, for the PR! The long CAGRA test times has long been a problem and I totally agree we have to do something with it.

I have nothing against splitting the CAGRA test into multiple executables.

However, I'm a bit hesitating to reduce the test cases via the generate_inputs(). Most of these were set during development for various (legit, but sometimes forgotten) reasons.

I think, instead of reducing the base set of tests, we can go couple other ways and have a greater impact on test times without significantly reducing the coverage:

  1. We can drastically reduce the number of cases in the AnnCagraAddNodesTest / AnnCagraFilterTest, while keeping the base AnnCagraTest as is. For example, we can remove all variants changing the build algorithm and refine ratios for the two suites as these parameters has no effect on them. We can copy-paste the generate_inputs() to generate_inputs_limited() and manually reduce the set by more than 50% of cases. Or we can simply filter out every second variant from the vector (knowing we've already tested the same combination in the base test).
  2. A more involved refactoring (perhaps for a follow-up): I believe we can share some of the state between the tests rather than generating everything from scratch. E.g. generate a blob of data big enough only once and slice/reuse it for all tests in a suite. Or keep an index between the cases varying only search parameters.

inputs2 =
raft::util::itertools::product<AnnCagraInputs>({100},
{1000},
{1, 5, 8, 64, 137, 256, 619, 1024}, // dim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we test a few edge cases where the dimensionality is 2ⁿ or 2ⁿ ± 1. These are important to check whether we have any problems with padding the data during build and search.

Copy link
Contributor Author

@bdice bdice Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that was the case. There were a few places that varied dim over a wide range of values. I kept different dim values in each test so that the full range would be covered. Perhaps we can cover the full range in only one set of inputs instead, and use a small range for the other cases?

cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
@@ -876,14 +876,15 @@ class AnnCagraFilterTest : public ::testing::TestWithParam<AnnCagraInputs> {
inline std::vector<AnnCagraInputs> generate_inputs()
{
// TODO(tfeher): test MULTI_CTA kernel with search_width > 1 to allow multiple CTA per queries
// Varying dim, k, graph_build_algo, search_algo, max_queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the extra annotations on what are we varying. Thanks!

@bdice
Copy link
Contributor Author

bdice commented Jan 23, 2025

@achirkin Your comments make sense to me. Would you be willing to push those changes to this PR?

@tfeher
Copy link
Contributor

tfeher commented Jan 23, 2025

@bdice Thank you for your proposal. I am preparing the changes, and I will push them soon.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the AddNode and Filtering tests on a subset of input makes a large difference.

I have also applied a refactoring on the basic parameter combinations. This also gave a smaller extra improvement. @achirkin please have a look, if you prefer to keep some of the combinations then let me know.

Testing on H100, this runs 3.7x faster than the previously.

std::vector<AnnCagraInputs> inputs = raft::util::itertools::product<AnnCagraInputs>(
{100},
{1000},
{1, 8, 17},
{1, 16}, // k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dim and build algo combinations are tested below, therefor we focus on dim and search algo and max_query parameter value here.

auto inputs2 = raft::util::itertools::product<AnnCagraInputs>(
{1, 100},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above we tested with different max_query parameter, here we test with batch size 1 and 100.

{1, 3, 5, 7, 8, 17, 64, 128, 137, 192, 256, 512, 619, 1024}, // dim
{10},
{10000},
{192, 1024}, // dim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing AddNode, I assumed it is fine to limit to a smaller set of dimensions.

Copy link
Contributor Author

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfeher Thanks so much! Your changes look fine to me! It looks like ann_cagra/test_half_uint32_t.cu might have been missed in the refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply these changes to ann_cagra/test_half_uint32_t.cu as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nevermind! I see there are no AddNode / Filtering tests in that file. My apologies.

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tfeher, for the updates! Nice to see such a big speedup. Did you have a chance to check the difference in terms of the total number of test cases out of curiosity?

I'd personally prefer to keep the original generate_inputs() unchanged and see how other improvements reduce the test time - just to be on the safe side. But the changes look reasonable, so I don't see a point in holding up this PR.

@bdice
Copy link
Contributor Author

bdice commented Jan 24, 2025

@tfeher @achirkin I merged in the upstream to resolve merge conflicts from #595. I tried to reflect that PR's intent (adding a bunch of InnerProduct tests) but I would like your eyes one more time to make sure that I did what you expected w.r.t. this PR and #595. I'll also request a build codeowner review.

@bdice bdice changed the title Proposal to reduce CAGRA test runtime. Reduce CAGRA test runtime. Jan 24, 2025
@bdice bdice changed the title Reduce CAGRA test runtime. Reduce CAGRA test runtime Jan 24, 2025
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cpp/test/neighbors/ann_cagra.cuh Outdated Show resolved Hide resolved
@achirkin
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 5527cdf into rapidsai:branch-25.02 Jan 27, 2025
61 checks passed
@bdice
Copy link
Contributor Author

bdice commented Jan 27, 2025

Thanks so much @achirkin @tfeher!

narangvivek10 pushed a commit to SearchScale/cuvs that referenced this pull request Jan 27, 2025
Currently, running `NEIGHBORS_ANN_CAGRA_TEST` takes:
[0.96 hours on CUDA 11.8, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418022?pr=596#step:8:1718)
[1.59 hours on CUDA 12.5, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418329?pr=596#step:8:492)
[0.28 hours on CUDA 12.0, A100 (ARM)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418741?pr=596#step:8:1729)

Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes.

This PR proposes some changes to CAGRA tests:
- Each CAGRA type is now its own test executable (e.g. `NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST`)
- Some parameter combinations were trimmed by ~50%

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#602
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

4 participants