Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throughput metrics for REDUCTION_BENCH/REDUCTION_NVBENCH benchmarks #16126

Merged
merged 7 commits into from
Jul 2, 2024

Conversation

jihoonson
Copy link
Contributor

Description

This PR addresses #13735 for reduction benchmarks. There are 3 new utils added.

Here are snippets of reduction benchmarks after this change.

$ cpp/build/benchmarks/REDUCTION_BENCH
...
-----------------------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------
Reduction/bool_all/10000/manual_time                        10257 ns        26845 ns        68185 bytes_per_second=929.907M/s items_per_second=975.078M/s
Reduction/bool_all/100000/manual_time                       11000 ns        27454 ns        63634 bytes_per_second=8.46642G/s items_per_second=9.09075G/s
Reduction/bool_all/1000000/manual_time                      12671 ns        28658 ns        55261 bytes_per_second=73.5018G/s items_per_second=78.922G/s
...

$ cpp/build/benchmarks/REDUCTION_NVBENCH
...
## rank_scan

### [0] NVIDIA RTX A5500

|        T        | null_probability | data_size | Samples |  CPU Time  | Noise  |  GPU Time  | Noise |  Elem/s  | GlobalMem BW |  BWUtil   |
|-----------------|------------------|-----------|---------|------------|--------|------------|-------|----------|--------------|-----------|
|             I32 |                0 |     10000 |  16992x |  33.544 us | 14.95% |  29.446 us | 5.58% |  82.321M |   5.596 TB/s |   728.54% |
|             I32 |              0.1 |     10000 |  16512x |  34.358 us | 13.66% |  30.292 us | 2.87% |  80.020M |   5.286 TB/s |   688.17% |
|             I32 |              0.5 |     10000 |  16736x |  34.058 us | 14.31% |  29.890 us | 3.40% |  81.097M |   5.430 TB/s |   706.89% |
...

Note that, when the data type is a 1-byte-width type in the google benchmark result summary, bytes_per_second appears to be smaller than items_per_second. This is because the former is a multiple of 1000 whereas the latter is a multiple of 1024. They are in fact the same number.

Implementation-wise, these are what I'm not sure if I made a best decision.

  • Each of new utils above is declared and defined in different files. I did this because I could not find a good place to have them all, and they seem to belong to different utilities. Please let me know if there is a better place for them.
  • All the new utils are defined in the global namespace since other util functions seem to have been defined in the same way. Please let me know if this is not the convention.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

…s or GlobalMem BW for nvbench, for reduction benchmarks
@jihoonson jihoonson requested a review from a team as a code owner June 28, 2024 17:35
@jihoonson jihoonson requested review from harrism and davidwendt June 28, 2024 17:35
Copy link

copy-pr-bot bot commented Jun 28, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Jun 28, 2024
@davidwendt
Copy link
Contributor

Thanks for working on this @jihoonson
This looks good so far.
You should update the copyrights of all the new files to be just 2024 (no range).

@jihoonson
Copy link
Contributor Author

Thanks @davidwendt, I will fix the copyrights. Do you think it's a good idea to fix them in the reduction benchmark files as well in this PR? I can do that if so.

@davidwendt
Copy link
Contributor

davidwendt commented Jun 28, 2024

Thanks @davidwendt, I will fix the copyrights. Do you think it's a good idea to fix them in the reduction benchmark files as well in this PR? I can do that if so.

Any existing file you change should have 2024 added/updated to its range if it does not already have it.
Looks like for the changed files you have here, just change the 2023 to 2024.
For new files just 2024 (no range) is required.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 28, 2024
@jihoonson
Copy link
Contributor Author

Thanks @davidwendt. Fixed the copyrights as suggested.

cpp/benchmarks/common/table_utilities.hpp Outdated Show resolved Hide resolved
cpp/benchmarks/common/table_utilities.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. 👍
Minor updates with new function argument usage.


// The benchmark takes a column and produces one scalar.
set_items_processed(state, column_size + 1);
set_bytes_processed(state, estimate_size(std::move(values)) + cudf::size_of(output_dtype));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set_bytes_processed(state, estimate_size(std::move(values)) + cudf::size_of(output_dtype));
set_bytes_processed(state, estimate_size(*values) + cudf::size_of(output_dtype));

similarly at other places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think values->view() is more clear but leave it up to you if you'd rather use *values

Copy link
Contributor Author

@jihoonson jihoonson Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @karthikeyann, thanks for the review. I just want to better understand your comment. Your seem to be suggesting to pass a column_view instead of moving the column. This has been done in 40804e2. Or, are you suggesting to use the *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw David's comment above. I also find values->view() more explicit and clear, so would like to keep this pattern unless you feel strongly about it.

cpp/benchmarks/reduction/scan.cpp Outdated Show resolved Hide resolved
@davidwendt
Copy link
Contributor

/ok to test

@davidwendt
Copy link
Contributor

/ok to test

@jihoonson
Copy link
Contributor Author

@davidwendt @karthikeyann thanks for the review! This PR seems to have passed all checks. What will be the next step?

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

@davidwendt
Copy link
Contributor

/ok to test

@jihoonson
Copy link
Contributor Author

Hmm I'm not sure why the job pr / wheel-tests-cudf / 12.2.2, 3.9, amd64, ubuntu22.04, v100 (push) failed. I don't seem to see any error there.

@davidwendt
Copy link
Contributor

Hmm I'm not sure why the job pr / wheel-tests-cudf / 12.2.2, 3.9, amd64, ubuntu22.04, v100 (push) failed. I don't seem to see any error there.

Looks like something got stuck. I kicked off a re-run.

@jihoonson
Copy link
Contributor Author

Hmm I'm not sure why the job pr / wheel-tests-cudf / 12.2.2, 3.9, amd64, ubuntu22.04, v100 (push) failed. I don't seem to see any error there.

Looks like something got stuck. I kicked off a re-run.

Thanks! It's all green now 🙂

@harrism
Copy link
Member

harrism commented Jul 2, 2024

/merge

@rapids-bot rapids-bot bot merged commit 25febbc into rapidsai:branch-24.08 Jul 2, 2024
76 of 77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants