Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmarking] Conbench reports regressions and errors on latest PRs #45393

Open
raulcd opened this issue Jan 30, 2025 · 5 comments
Open

[Benchmarking] Conbench reports regressions and errors on latest PRs #45393

raulcd opened this issue Jan 30, 2025 · 5 comments

Comments

@raulcd
Copy link
Member

raulcd commented Jan 30, 2025

Describe the bug, including details regarding any error messages, version, and platform.

There seems to be an issue with benchmarks. Conbench is reporting errors and regressions on PRs lately.
TPCH-03 seems to have either a regression or a failure (https://conbench.ursa.dev/benchmark-results/06797e6f89c07c4680005b7275de1f32/). See graph:

Image

I saw the conbench reports errors and regressions, starting on this PR:

But I am unsure if it's related to that change.

Component(s)

Benchmarking

@raulcd
Copy link
Member Author

raulcd commented Jan 30, 2025

cc @pitrou @zanmato1984 @assignUser

@pitrou
Copy link
Member

pitrou commented Jan 31, 2025

@raulcd Is this on one specific benchmarking machine or all of them?

@pitrou
Copy link
Member

pitrou commented Jan 31, 2025

Also, can you point to the errors (or at least one of them)?

@zanmato1984
Copy link
Contributor

zanmato1984 commented Feb 1, 2025

Sorry I'm on vacation and unable to get on to it now. From what I see in the benchmark report, it's possible that #45336 is related (TPCH Q3 has heavy joins and aggregations, both in the code path that PR touched). And it's also possible that the new answer is correct (meaning the old passes are false positives) - if I'm interpreting the the report right:

             'old vs new\n'
             '            o_shippriority\n'
             '- old[1, ]         2330369\n'
             '+ new[1, ]               0\n'
             '- old[2, ]         3466209\n'
             '+ new[2, ]               0\n'
             '- old[3, ]          497024\n'
             '+ new[3, ]               0\n'
             '- old[4, ]         1190661\n'
             '+ new[4, ]               0\n'
             '- old[5, ]         2439265\n'
             '+ new[5, ]               0\n'
             '- old[6, ]         5112866\n'
             '+ new[6, ]               0\n'
             '- old[7, ]         5526083\n'
             '+ new[7, ]               0\n'
             '- old[8, ]         2630758\n'
             '+ new[8, ]               0\n'
             '- old[9, ]          997703\n'
             '+ new[9, ]               0\n'
             '- old[10, ]        2307653\n'
             '+ new[10, ]              0\n'

Note column o_shippriority should be all 0 in a valid TPCH Q3 result. But this is only wild guess for now.

I think I can look into it in about two weeks. But during my absence, it'll be nice if some people familiar with these benchmarks (R? - I see they are all ran in R) can help to confirm my assumption ("the old result of o_shippriority being non-zero/o_orderkey/l_orderkey"). Thanks.

@zanmato1984
Copy link
Contributor

OK, I managed to learn some of our R benchmarking code and re-interpret the benchmark result. This is indeed a regression/bug.

The comparison of the benchmark is done via https://github.com/voltrondata-labs/arrowbench/blob/deacdebc64bb5c04f8976138c45db96710e56e77/R/bm-tpc-h.R#L107C24-L107C38:

all_equal_out <- waldo::compare(result, answer, tolerance = 0.01)

And according to the waldo doc (https://waldo.r-lib.org/), the "old" in the report is actually the first argument (result) and the "new" the second (answer). So apparently the bug turned the all-zero l_shippriority column into some arbitrary non-zero values.

I'll definitely look into it once I'm back. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants