[Benchmarking] Conbench reports regressions and errors on latest PRs #45393

raulcd · 2025-01-30T12:27:16Z

Describe the bug, including details regarding any error messages, version, and platform.

There seems to be an issue with benchmarks. Conbench is reporting errors and regressions on PRs lately.
TPCH-03 seems to have either a regression or a failure (https://conbench.ursa.dev/benchmark-results/06797e6f89c07c4680005b7275de1f32/). See graph:

I saw the conbench reports errors and regressions, starting on this PR:

GH-45334: [C++][Acero] Fix swiss join overflow issues in row offset calculation for fixed length and null masks #45336

But I am unsure if it's related to that change.

Component(s)

Benchmarking

raulcd · 2025-01-30T12:30:28Z

cc @pitrou @zanmato1984 @assignUser

pitrou · 2025-01-31T09:58:38Z

@raulcd Is this on one specific benchmarking machine or all of them?

pitrou · 2025-01-31T09:59:11Z

Also, can you point to the errors (or at least one of them)?

zanmato1984 · 2025-02-01T15:20:42Z

Sorry I'm on vacation and unable to get on to it now. From what I see in the benchmark report, it's possible that #45336 is related (TPCH Q3 has heavy joins and aggregations, both in the code path that PR touched). And it's also possible that the new answer is correct (meaning the old passes are false positives) - if I'm interpreting the the report right:

             'old vs new\n'
             '            o_shippriority\n'
             '- old[1, ]         2330369\n'
             '+ new[1, ]               0\n'
             '- old[2, ]         3466209\n'
             '+ new[2, ]               0\n'
             '- old[3, ]          497024\n'
             '+ new[3, ]               0\n'
             '- old[4, ]         1190661\n'
             '+ new[4, ]               0\n'
             '- old[5, ]         2439265\n'
             '+ new[5, ]               0\n'
             '- old[6, ]         5112866\n'
             '+ new[6, ]               0\n'
             '- old[7, ]         5526083\n'
             '+ new[7, ]               0\n'
             '- old[8, ]         2630758\n'
             '+ new[8, ]               0\n'
             '- old[9, ]          997703\n'
             '+ new[9, ]               0\n'
             '- old[10, ]        2307653\n'
             '+ new[10, ]              0\n'

Note column o_shippriority should be all 0 in a valid TPCH Q3 result. But this is only wild guess for now.

I think I can look into it in about two weeks. But during my absence, it'll be nice if some people familiar with these benchmarks (R? - I see they are all ran in R) can help to confirm my assumption ("the old result of o_shippriority being non-zero/o_orderkey/l_orderkey"). Thanks.

zanmato1984 · 2025-02-02T06:56:09Z

OK, I managed to learn some of our R benchmarking code and re-interpret the benchmark result. This is indeed a regression/bug.

The comparison of the benchmark is done via https://github.com/voltrondata-labs/arrowbench/blob/deacdebc64bb5c04f8976138c45db96710e56e77/R/bm-tpc-h.R#L107C24-L107C38:

all_equal_out <- waldo::compare(result, answer, tolerance = 0.01)

And according to the waldo doc (https://waldo.r-lib.org/), the "old" in the report is actually the first argument (result) and the "new" the second (answer). So apparently the bug turned the all-zero l_shippriority column into some arbitrary non-zero values.

I'll definitely look into it once I'm back. Thanks.

raulcd added the Type: bug label Jan 30, 2025

github-actions bot added the Component: Benchmarking label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmarking] Conbench reports regressions and errors on latest PRs #45393

[Benchmarking] Conbench reports regressions and errors on latest PRs #45393

raulcd commented Jan 30, 2025 •

edited

Loading

raulcd commented Jan 30, 2025

pitrou commented Jan 31, 2025

pitrou commented Jan 31, 2025

zanmato1984 commented Feb 1, 2025 •

edited

Loading

zanmato1984 commented Feb 2, 2025

[Benchmarking] Conbench reports regressions and errors on latest PRs #45393

[Benchmarking] Conbench reports regressions and errors on latest PRs #45393

Comments

raulcd commented Jan 30, 2025 • edited Loading

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

raulcd commented Jan 30, 2025

pitrou commented Jan 31, 2025

pitrou commented Jan 31, 2025

zanmato1984 commented Feb 1, 2025 • edited Loading

zanmato1984 commented Feb 2, 2025

raulcd commented Jan 30, 2025 •

edited

Loading

zanmato1984 commented Feb 1, 2025 •

edited

Loading