Reduce the overall runtime of the benchmarks without increasing variability #372

mdboom · 2024-12-12T15:49:09Z

I did a bit of analysis based on the Faster CPython team's current benchmarking results. From this it's clear that for many benchmarks we run more times than we need to in order to get a consistent result. Reducing the number of processes that are spawned for some of them would yield back a large fraction of the overall runtime. In most cases this runtime is multiplied by 2 because you need to benchmark both a head and a base commit.

The easy part is to just add processes= to the Runner constructor to individual benchmarks based on this analysis. But we also want to continuously confirm that that analysis remains correct. Obviously if a benchmark (or its dependency) changes, that invalidates all of that analysis, but that happens fairly infrequently and we try to revalidate the benchmarks when we do that. It's more likely that a change in the Python runtime might make a benchmark more or less stable, so we need to automatically detect for that.

Therefore, I propose:

Adding a new message for when it runs too many times:

Benchmark ran more times than was necessary to obtain a consistent result. Consider passing processes=N to the Runner constructor.

Adding a message for when it runs too few times. This could probably piggy back on the existing message that warns about a high standard deviation, just with a slightly different calculation for when it would be displayed. We can't really determine how many additional loops are needed, so the existing advice there of "Try to rerun the benchmark with more runs, values and/or loops" wouldn't change.

Does this make sense to others, particularly @vstinner who worked on the dynamic loop determination stuff in the past (which is related, but not the same -- everything here is related to the outermost process-spawning loop).

The text was updated successfully, but these errors were encountered:

vstinner · 2024-12-16T14:01:33Z

We can't really determine how many additional loops are needed

Something like https://cran.r-project.org/web/packages/changepoint/index.html (in the R language) may help for such task. I failed to do something similar in Python.

mdboom self-assigned this Dec 12, 2024

mdboom mentioned this issue Dec 17, 2024

Add warnings about too few or too many samples psf/pyperf#210

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the overall runtime of the benchmarks without increasing variability #372

Reduce the overall runtime of the benchmarks without increasing variability #372

mdboom commented Dec 12, 2024

vstinner commented Dec 16, 2024

Reduce the overall runtime of the benchmarks without increasing variability #372

Reduce the overall runtime of the benchmarks without increasing variability #372

Comments

mdboom commented Dec 12, 2024

vstinner commented Dec 16, 2024