Optimize uploading ballots for recreating prior elections #776
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This significantly reduces the time it takes to upload ballots in bulk. All benchmarks are done with the Burling 2009 mayor election which has 8980 votes. And unless otherwise stated each test was using a batch size of 100. Increasing the batch size above 100 would exceed the request size limit
Prior optimizations
I already pushed a changing bypassing the ballot queue system. That design is important when processing real time votes and ensuring each vote is counted exactly once, but it's not as important when analyzing prior elections. This effectively reduce the time from several minutes to 77 seconds
Optimizations in Pull Request
Baseline - 77s
Backend no-op - 6.7s - This gives an estimate for how much of
w/ reduced logging - 69.791s - Previous logging was very verbose and it slowed down the response time.
w/ bulk database update - 34.935s - Executes the bulk upload in a single database query, rather than looping with one row per query
w/ batch size of 50 - 58.495s - Decreasing the batch size make it significantly slower (and justified optimizing batch size in the other direction)
w/ batch size of 700 - 25 s - Refactor the request format to allow for larger batch sizes
Screenshots / Videos (frontend only)
the video was even faster than my benchmarks. I probably should have averaged over more iterations to get a better sense
2025-02-05.22-12-22.mp4