Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the overall runtime of the benchmarks without increasing variability #372

Open
mdboom opened this issue Dec 12, 2024 · 1 comment
Assignees

Comments

@mdboom
Copy link
Contributor

mdboom commented Dec 12, 2024

I did a bit of analysis based on the Faster CPython team's current benchmarking results. From this it's clear that for many benchmarks we run more times than we need to in order to get a consistent result. Reducing the number of processes that are spawned for some of them would yield back a large fraction of the overall runtime. In most cases this runtime is multiplied by 2 because you need to benchmark both a head and a base commit.

The easy part is to just add processes= to the Runner constructor to individual benchmarks based on this analysis. But we also want to continuously confirm that that analysis remains correct. Obviously if a benchmark (or its dependency) changes, that invalidates all of that analysis, but that happens fairly infrequently and we try to revalidate the benchmarks when we do that. It's more likely that a change in the Python runtime might make a benchmark more or less stable, so we need to automatically detect for that.

Therefore, I propose:

  • Adding a new message for when it runs too many times:
Benchmark ran more times than was necessary to obtain a consistent result. Consider passing processes=N to the Runner constructor.
  • Adding a message for when it runs too few times. This could probably piggy back on the existing message that warns about a high standard deviation, just with a slightly different calculation for when it would be displayed. We can't really determine how many additional loops are needed, so the existing advice there of "Try to rerun the benchmark with more runs, values and/or loops" wouldn't change.

Does this make sense to others, particularly @vstinner who worked on the dynamic loop determination stuff in the past (which is related, but not the same -- everything here is related to the outermost process-spawning loop).

@mdboom mdboom self-assigned this Dec 12, 2024
@vstinner
Copy link
Member

We can't really determine how many additional loops are needed

Something like https://cran.r-project.org/web/packages/changepoint/index.html (in the R language) may help for such task. I failed to do something similar in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants