Improve cost-efficient scaling #249

xoxys · 2025-01-14T07:59:55Z

@anbraten @xoxys Sorry for the late reply.

I’m not very familiar with Go, so although I’d like to contribute code, I’m not really able to do so. Instead, I’ll jot down the solution design to optimize costs for Hetzner or any VPS provider that charges in units smaller than one month, hoping someone is interested in implementing the corresponding code.

First, I’d like to introduce two essential configuration options: one is the billing cycle (e.g., WOODPECKER_PROVIDER_BILLING_CYCLE), and the other is the deletion/release window (e.g., WOODPECKER_PROVIDER_[DELETION or RELEASE]_WINDOW). This latter value must be smaller than the billing cycle.

When a runner is dynamically created, the autoscaler records the Unix timestamp at which the server was created (let’s temporarily call it CREATION_TIME), and then runs the CI tasks. Once the tasks are finished and it’s time to check whether the resources should be reclaimed, the following steps occur:

Determine how long the VPS has been running.
Subtract the server’s creation timestamp from the current Unix timestamp. This gives the VPS’s runtime in seconds. Let’s call it RUNNING_TIME.
Determine the VPS’s position in the current billing cycle.
Take Hetzner as an example: the billing cycle is one hour, i.e., PROVIDER_BILLING_CYCLE = 1h (3600s). Repeatedly subtract PROVIDER_BILLING_CYCLE from RUNNING_TIME until the remainder is less than PROVIDER_BILLING_CYCLE. That remainder is the VPS’s position within the current billing cycle, which we’ll call CURRENT_LOCATION.
Determine the boundary between the IDLE WINDOW and the RELEASE WINDOW.
Subtract PROVIDER_RELEASE_WINDOW from PROVIDER_BILLING_CYCLE. The resulting value is the boundary between the IDLE WINDOW and the RELEASE WINDOW, which we’ll call DIVIDING_POINT.
Check which side of that boundary the VPS is currently on by comparing CURRENT_LOCATION and DIVIDING_POINT:
- If CURRENT_LOCATION < DIVIDING_POINT, the VPS is in the IDLE WINDOW. Proceed to step 5.
- If CURRENT_LOCATION >= DIVIDING_POINT, the VPS is in the RELEASE WINDOW. Proceed to step 8.
Since the VPS is in the IDLE WINDOW, deleting it wouldn’t align with cost-optimization goals, so keep the VPS. Use DIVIDING_POINT - CURRENT_LOCATION to calculate how much time remains before entering the next RELEASE WINDOW (call this DISTANCE_TO_NEXT_RELEASE_WINDOW), and enter a waiting state. Essentially, it’s like running sleep DISTANCE_TO_NEXT_RELEASE_WINDOW on a Linux system.
- If no new tasks come in during this period, proceed to step 6.
- If a new task is assigned, proceed to step 7.
Since no new tasks arrived and the RELEASE WINDOW has been reached, reclaim (delete) the VPS resources.
Since a new task was assigned, clear the waiting state and run the task normally. Once it finishes, return to step 1 and recalculate.
Now the system is in the RELEASE WINDOW, and no tasks were assigned…

WAIT! At this point, I realized we might need to handle a situation where the time it takes to destroy the server is still counted toward the billing cycle. Therefore, I want to introduce an optional configuration option to skip the current cycle window (WOODPECKER_PROVIDER_SKIP_CYCLE), which must be smaller than PROVIDER_RELEASE_WINDOW.

...and no tasks were assigned, check whether the VPS is within the SKIP WINDOW, similar to how we checked for the RELEASE WINDOW before (setp 3&4, calculating dividing points, etc.).
- If it’s in the SKIP WINDOW, go to step 9.
- Otherwise, go to step 10.
  (If this option is 0, just skip straight to step 10.)
Since the server is in the SKIP WINDOW, cancel resource reclamation and move on to the next billing cycle. This involves calculating the time to the next RELEASE WINDOW, starting a timer, waiting for any new tasks, and so on (eg. step7) .
Since the server is not in the SKIP WINDOW (or the SKIP WINDOW does not exist at all), reclaim the VPS resources. THE END.

If you visualize it as a picture, it roughly looks like this:

I hope those reading this can understand my idea and be inspired to implement it. If there are any questions, feel free to ask. I’ll respond as soon as possible (at least it won’t take me another eight months this time, trust me bro :)).

Ps. Oh maybe we should check if there gonna be a new task before actually reclaim the VPS resource.

Originally posted by @OrvilleQ in #73 (comment)

The text was updated successfully, but these errors were encountered:

xoxys · 2025-01-14T08:01:52Z

I have moved this to a new issue. Many thanks for the effort and the detailed description!

xoxys added the enhancement Enhance existing feature label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve cost-efficient scaling #249

Improve cost-efficient scaling #249

xoxys commented Jan 14, 2025 •

edited

Loading

xoxys commented Jan 14, 2025

Improve cost-efficient scaling #249

Improve cost-efficient scaling #249

Comments

xoxys commented Jan 14, 2025 • edited Loading

xoxys commented Jan 14, 2025

xoxys commented Jan 14, 2025 •

edited

Loading