-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add bytes_per_second
to all libcudf benchmarks
#13735
Comments
Hi, I want to start contributing to cudf and would like to work on this issue. Should I create one PR per benchmark or put all in one PR? |
Thanks for looking into this. Smaller PRs are easier to review so I would prefer one PR per benchmark. |
Due to missing parentheses in APPLY_BOOLEAN_MASK benchmark the number of written bytes were not multiplied with the number of iterations of this benchmark. This patches relates to rapidsai#13735.
Due to missing parentheses in APPLY_BOOLEAN_MASK benchmark the number of written bytes were not multiplied by the number of iterations of this benchmark. This patch relates to rapidsai#13735.
To add `bytes_per_second` a call to `SetBytesProcessed()` with number of written and read bytes is added to the benchmark. This patch relates to rapidsai#13735.
To add `bytes_per_second`, a call to `SetBytesProcessed()` with the number of written and read bytes is added to the benchmark. This patch relates to rapidsai#13735.
…13937) Due to missing parentheses in APPLY_BOOLEAN_MASK benchmark, the number of written bytes were not multiplied by the number of iterations of this benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #13937
This patch relates to rapidsai#13735.
To add `bytes_per_second`, a call to `SetBytesProcessed()` with the number of written and read bytes is added to the benchmark. This patch relates to rapidsai#13735.
To add `bytes_per_second`, a call to `.SetBytesProcessed()` with the number of written and read bytes is added to the benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Yunsong Wang (https://github.com/PointKernel) URL: #13938
Adds `bytes_per_second to `COPY_IF_ELSE_BENCH`. This patch relates to rapidsai#13735.
Adds `bytes_per_second` to the `PARTITION_BENCH` benchmark. This patch relates to rapidsai#13735.
In the past, the HASING_NVBENCH benchmark treated the nulls parameter as a boolean. Any value other than 0.0 resulted in a null probability of 1.0. Now, the nulls parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to rapidsai#13735.
In the past, the HASING_NVBENCH benchmark treated the nulls parameter as a boolean. Any value other than 0.0 resulted in a null probability of 100% for the generated data. Now, the nulls parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to rapidsai#13735.
In the past, the `HASING_NVBENCH` benchmark treated the `nulls` parameter as a boolean. Any value other than 0.0 resulted in a null probability of 100% for the generated data. Now, the `nulls` parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to rapidsai#13735.
To calculate the number of bytes written and read by the benchmark a few helper function are introduced which calculate the payload size of a column, a table, and the results of a groupby. This patch relates to rapidsai#13735.
Adds `bytes_per_second` to the `COPY_IF_ELSE_BENCH` benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #13960
Adds `bytes_per_second` to the `PARTITION_BENCH` benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) - Mark Harris (https://github.com/harrism) Approvers: - David Wendt (https://github.com/davidwendt) - Mark Harris (https://github.com/harrism) URL: #13965
This patch adds memory statistics for the GROUPBY_NVBENCH benchmarks. For this purpose helper functions are introduced to compute the payload size for: - Column - Table - Groupby execution results This patch relates to rapidsai#13735.
In the past, the HASING_NVBENCH benchmark treated the `nulls` parameter as a boolean. Any value other than 0.0 resulted in a null probability of 1.0. Now, the `nulls` parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) - Yunsong Wang (https://github.com/PointKernel) Approvers: - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) URL: #13967
This patch adds memory statistics for the GROUPBY_NVBENCH benchmarks. For this purpose helper functions are introduced to compute the payload size for: - Column - Table - Groupby execution results This patch relates to rapidsai#13735.
This patch adds memory statistics for the GROUPBY_NVBENCH benchmarks. For this purpose helper functions are introduced to compute the payload size for: - Column - Table - Groupby execution results This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
This patch relates to rapidsai#13735.
#14172) This patch relates to #13735. Benchmark: [benchmark_distinct_count.txt](https://github.com/rapidsai/cudf/files/12700496/benchmark_distinct_count.txt) Authors: - Martin Marenz (https://github.com/Blonck) - Mark Harris (https://github.com/harrism) Approvers: - David Wendt (https://github.com/davidwendt) - Karthikeyan (https://github.com/karthikeyann) - Mark Harris (https://github.com/harrism) URL: #14172
This patch relates to #13735. Benchmark: [transpose_benchmark.txt](https://github.com/rapidsai/cudf/files/12699834/transpose_benchmark.txt) Authors: - Martin Marenz (https://github.com/Blonck) - Mark Harris (https://github.com/harrism) Approvers: - Mark Harris (https://github.com/harrism) - Yunsong Wang (https://github.com/PointKernel) URL: #14170
Adds `bytes_per_second` to `SHIFT_BENCH`. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) - Mark Harris (https://github.com/harrism) URL: #13950
It is a good idea to have some metrics for data size. I wonder whether it would be even better if we have separate metrics for input and output data. For example, we could add custom metrics like |
…ks (#16126) This PR addresses #13735 for reduction benchmarks. There are 3 new utils added. - `int64_t estimate_size(cudf::table_view)` returns a size estimate for the given table. #13984 was a previous attempt to add a similar utility, but this implementation uses `cudf::row_bit_count()` as suggested in #13984 (comment) instead of manually estimating the size. - `void set_items_processed(State& state, int64_t items_processed_per_iteration)` is a thin wrapper of `State.SetItemsProcessed()`. This wrapper takes `items_processed_per_iteration` as a parameter instead of `total_items_processed`. This could be useful to avoid repeating `State.iterations() * items_processed_per_iteration` in each benchmark class. - `void set_throughputs(nvbench::state& state)` is added as a workaround for NVIDIA/nvbench#175. We sometimes want to set throughput statistics after `state.exec()` calls especially when it is hard to estimate the result size upfront. Here are snippets of reduction benchmarks after this change. ``` $ cpp/build/benchmarks/REDUCTION_BENCH ... ----------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------------------- Reduction/bool_all/10000/manual_time 10257 ns 26845 ns 68185 bytes_per_second=929.907M/s items_per_second=975.078M/s Reduction/bool_all/100000/manual_time 11000 ns 27454 ns 63634 bytes_per_second=8.46642G/s items_per_second=9.09075G/s Reduction/bool_all/1000000/manual_time 12671 ns 28658 ns 55261 bytes_per_second=73.5018G/s items_per_second=78.922G/s ... $ cpp/build/benchmarks/REDUCTION_NVBENCH ... ## rank_scan ### [0] NVIDIA RTX A5500 | T | null_probability | data_size | Samples | CPU Time | Noise | GPU Time | Noise | Elem/s | GlobalMem BW | BWUtil | |-----------------|------------------|-----------|---------|------------|--------|------------|-------|----------|--------------|-----------| | I32 | 0 | 10000 | 16992x | 33.544 us | 14.95% | 29.446 us | 5.58% | 82.321M | 5.596 TB/s | 728.54% | | I32 | 0.1 | 10000 | 16512x | 34.358 us | 13.66% | 30.292 us | 2.87% | 80.020M | 5.286 TB/s | 688.17% | | I32 | 0.5 | 10000 | 16736x | 34.058 us | 14.31% | 29.890 us | 3.40% | 81.097M | 5.430 TB/s | 706.89% | ... ``` Note that, when the data type is a 1-byte-width type in the google benchmark result summary, `bytes_per_second` appears to be smaller than `items_per_second`. This is because the former is a multiple of 1000 whereas the latter is a multiple of 1024. They are in fact the same number. Implementation-wise, these are what I'm not sure if I made a best decision. - Each of new utils above is declared and defined in different files. I did this because I could not find a good place to have them all, and they seem to belong to different utilities. Please let me know if there is a better place for them. - All the new utils are defined in the global namespace since other util functions seem to have been defined in the same way. Please let me know if this is not the convention. Authors: - Jihoon Son (https://github.com/jihoonson) Approvers: - Mark Harris (https://github.com/harrism) - David Wendt (https://github.com/davidwendt) URL: #16126
Thank you @jihoonson for your suggestion. I agree that we need better and more granular throughput metrics. I would like to start with an input+output bytes per second metric that is consistently implemented in all benchmarks. From there we can add the separate input/output and bytes/items metrics as you suggest. |
@GregoryKimball sounds good 🙂 As my first PR has been merged, I will continue working on the remaining benchmarks. |
I started thinking that perhaps we should enforce all benchmarks to add such metrics. One way to do it is having new benchmark fixtures, which would explode at the end of the benchmark if those metrics are not set. Then we could update every benchmark to use the new fixtures. @GregoryKimball do you think this is a good idea? |
Thank you @jihoonson for this suggestion. We might go this route at some point, but for now I think converting the rest of the benchmarks from gbench to nvbench and adding the missing metrics is higher priority. |
Many libcudf benchmarks report the
bytes_per_second
processed as part of the output data. This value is useful for comparing benchmarks because it normalizes the increasing execution time from processing more data. Also,bytes_per_second
andreal_time_s
together let us computebytes_processed
values which serve as a useful Y-axis.As of the end of 23.12 development, many benchmarks still do not report
bytes_per_second
in the output data. Here is a figure summarizing the metrics reported by the benchmark suite.APPLY_BOOLEAN_MASK
BINARYOP
COPY_IF_ELSE
GROUPBY
GROUPBY_NV
HASHING
JOIN
JOIN_NV
QUANTILES
REDUCTION
REPLACE
SEARCH
SEARCH_NV
SET_OPS_NV
SHIFT
SORT
SORT_NV
STREAM_COMPACTION_NV
TRANSPOSE
Note: For this tracking list, cuIO benchmarks are omitted because "encoded file size" serves a similar purpose.
The text was updated successfully, but these errors were encountered: