-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous Benchmarking #4687
Comments
Thanks @jangorecki This story need to be broken down into tasks in github to plan deliverables else hard to know progress. Ideally the framework should enable anyone to add new regression tests easily. So once the framework is in others can add the needed tests as need arises. |
I think a simple single-threaded vs multithreaded test should be in scope (maybe 1 thread vs 2 or 4 or 8 threads. not sure what machine we'd be testing on) |
Hey! if you are still interested in CB I would like to mention {touchstone} which will run benchmarks and comment them on the PR after every push! We are getting ready to submit to rOpenSci and CRAN. |
continuous benchmarking is one goal of the NSF POSE project, that @DorisAmoakohene is working on. |
touchstone development has stalled a bit (more or less because it's working for the projects it is used in) but I have tentative plans to add an option to export results to conbench, which is active and being used by high profile projects such as Apache Arrow and Meta's Velox. The data export is done via JSON so it would be very easy to activate once a conbench server has been setup for dt (and I added the export function 😁) |
@Anirban166 please see animint/animint2#101 (comment) for an explanation of how to create fine grained token with permission to write a particular repo in the context of a github action for performance testing (idea is to write performance test plots which show up in PRs, in gh-pages branch of Rdatatable/performance_test_PR_comment_plots) |
With regards to the issue at hand, I've been working on identifying performance regressions from pull requests (taking into account historical regressions) via a GitHub Action that I created. |
@Anirban166 You might already be aware of this but a common issue with these type of workflows (do something and comment result on PR) is that the Commenting on a PR requires the I really wish there was a separate |
Ditto! I'm aware and that's exactly what I've been telling @tdhock over the past few weeks as well. (In fact, I brought this up to him even before I started on the action) I wish that too :( At present, I'll be running the workflows on my repository until we get better ideas on that (and since the tests are to be updated from time to time, it does require some manual configuration). One way is for the user to create another branch (apart from their fork of |
As already mentioned in multiple issues and over email/slack, we need automated tests that will be able to track performance regression.
This issue is meant to define scope.
Related useful project is planned in conbench. Once it will be working, I think we should use it. Unfortunately it does not seem to happen anytime soon, or even in a more distant future.
Anyway, keeping scope minimal should make it easier to eventually move to conbench later on.
Another related work is my old project macrobenchmarking.
And recent draft PR #4517.
Scope
Dimensions by which we will track timings
benchmark.Rraw
)Dimensions that for now I propose to not include in scope
datatable.optimize
optionChallenges
Store timings
In current infrastructre we do not have any processes that appends artifacts (timings in context of CB). Each CB run has to store results somewhere and re-use them later on.
Signalling a regression
Environment
To reduce number of false regression signals we need to use private dedicated infrastructure.
Having dedicated machine may not be feasible, so we need to have a mechanism of signalling to jenkins (or other orchestration process) that particular machine is in use in an exclusive mode.
Pipeline
In the most likely case of not having a dedicated machine, CB may ended up being queued for a longer while (up to multiple days). Therefore it make sense to have it in a separate pipeline rather than in our data.table GLCI. Such CB pipeline could be scheduled to run daily or weekly instead of running on each commit.
Versioning
data.table
project? or a separate projectinst/tests/benchmark.Rraw
benchmark()
that meant to be used liketest()
, andbenchmark.data.table()
to be used liketest.data.table()
.ci/
Example test cases
[[
on a list column by group [[ by group takes forever (24 hours +) with v1.13.0 vs 4 seconds with v1.12.8 #4646DT[10L]
,DT[, 3L]
Selecting from data.table by row is very slow #3735.SD
for many columns add timing test for many .SD cols #3797setDT
in a loop setDT could be much simpler #4476DT[, uniqueN(a), by=b]
, should stress new throttle feature throttle threads for iterated small data tasks #4484The text was updated successfully, but these errors were encountered: