KEP-4563: EvictionRequest API (fka Evacuation) #4565

atiratree · 2024-03-28T15:05:26Z

One-line PR description: introduce EvictionRequest API to allow managed graceful pod removal

Issue link: EvictionRequest API #4563

Other comments: part of the functionality has been split from KEP-4212: Declarative Node Maintenance #4213 into this KEP

sftim · 2024-03-28T16:12:00Z

keps/sig-apps/4563-evacuation-api/README.md

+We will introduce a new term called evacuation. This is a contract between the evacuation instigator,
+the evacuee, and the evacuator. The contract is enforced by the API and an evacuation controller.
+We can think of evacuation as a managed and safer alternative to eviction.


Watch out for the risk of confusing end users.

We already have preemption and eviction and people confuse the two. Or three, because there are two kinds of eviction. And there's disruption in the mix.

Do we want to rename Scheduling, Preemption and Eviction to Scheduling, Preemption, Evacuation and Eviction?

Good idea, I added a mention which of what kind of eviction I mean here.

Do we want to rename Scheduling, Preemption and Eviction to Scheduling, Preemption, Evacuation and Eviction?

Yes, I think we want to add a new concept there and generally update the docs once we have an alpha.

I'm with Tim here.

Preemption vs Eviction is already quite confusing. And TBH, I couldn't fully understand what the "evacuation" is supposed to solve by reading the summary or motivation.

From Goals:

Changes to the eviction API to support the evacuation process.

If this is already going to be part of the Eviction API, maybe it should be named as a form of eviction. Something like "cooperative eviction" or "eviction with ack" or something along those lines?

I'm all for framing it as another type of eviction; we already have two, so the extra cognitive load for users is not so much a problem.

@alculquicondor I have updated the summary and goals, I hope it makes more sense now.

I think the name should make the most sense to the person creating the Evacuation (Evacuation Instigator ATM). So CooperativeEviction or EvictionWithAck is a bit misleading IMO. Because from that person's perpective there is no additional step required of them. Only the evacuators and the evacuation controller implement the cooperative evacuation process but this is hidden from the normal user.

My suggestions:

GracefulEviction (might confuse people if it is associated with graceful pod termination, which it is not)

SafeEviction (*safer than the API-initiated one for some pods)

Or just call it Eviction? And tell people to use it instead of the Eviction API endpoint? This might be a bit confusing (at least in the beginning)

We can bikeshed names for the API kind; I'd throw a few of my own into the hat:

EvictionRequest

PodEvictionRequest

I have renamed the API to EvictionRequest to make the term recognizable. A minor disadvantage is that we have to clarify what type of eviction we mean if we say evict (API-initiate eviction, or EvictionRequest)

The rest of the renames are as follows:

Evacuation (noun) -> EvictionRequest / Eviction Process evacuation (verb) -> request an eviction / terminate / evict / process eviction Evacuator -> Interceptor Evacuee -> Pod Evacuator Class -> Interceptor Class Evacuation Instigator -> Eviction Requester Evacuation Controller -> Eviction Request Controller ActiveEvacuatorClass -> ActiveInterceptorClass ActiveEvacuatorCompleted -> ActiveInterceptorCompleted EvacuationProgressTimestamp -> ProgressTimestamp ExpectedEvacuationFinishTime -> ExpectedInterceptorFinishTime EvacuationCancellationPolicy -> EvictionRequestCancellationPolicy FailedEvictionCounter -> FailedAPIEvictionCounter

sftim · 2024-03-28T16:17:34Z

keps/sig-apps/4563-evacuation-api/README.md

+<!--
+What other approaches did you consider, and why did you rule them out? These do
+not need to be as detailed as the proposal, but should include enough
+information to express the idea and why it was not acceptable.
+-->


We have a number of examples of having a SomethingRequest or SomethingClaim API that then causes a something (certificate signing, node provisioning, etc).
Think of TokenRequest (a subresource for a ServiceAccount), or CertificateSigningRequest.

I would confidently and strongly prefer to have an EvictionRequest or PodEvictionRequest API, rather than an Evacuation API kind.

It's easy to teach that we have evictions and than an EvictionRequest is asking for one to happen; it's hard to teach the difference between an eviction and an evacuation.

As a side effect, this makes the feature gate easier to name (eg PodEvictionRequests).

As you have mentioned, we already have different kinds of eviction. So I think it would be good to use a completely new term to distinguish it from the others.

Also, Evacuation does not always result in eviction (and PDB consultation). It depends on the controller/workload. For some workloads like DaemonSets and static pods, API eviction has never worked before. This could also be very confusing if we name it the same way.

I think Evacuation fits this better because

The name is shorter. If we go with EvacuationRequest then the evacuation will become just an abstract term and less recognizable.

It seems it will have quite a lot of functionality included (state synchronization between multiple instigators and multiple evacuators, state of the evacuee and evacuation). TokenRequest and CertificateSigningRequest are simpler and not involved in a complex process.

I suggested EvictionRequest so that we don't have to have a section with the (too long) title: Scheduling, Preemption, Evacuation and Eviction. Not EvacuationRequest.

Adding another term doesn't scale so well: it means helping n people understand the difference between evacuation and eviction. It's a scaling challenge where n is not only large, it probably includes quite a few Kubernetes maintainers.

As for CertificateSigningRequest being simple: I don't buy it. There are three controllers, custom signers, an integration with trust bundles, the horrors of ASN.1 and X.509… trust me, it's complicated enough.

I understand that it will be confusing for people, but that will happen regardless of what term we will use.

My main issue is that evacuation does not directly translate to eviction. Therefore, I think it would be preferable to choose a new term (not necessarily evacuation).

I would like to get additional opinions from people about this. And we will definitely have to come back to this in the API review.

Should be resolved now: #4565 (comment)

keps/sig-apps/4563-evacuation-api/README.md

atiratree · 2024-04-09T17:15:51Z

I have updated the KEP to include support for multiple evacuators
Evacuators can now advertise which pods they are able to evacuate, even before the evacuation. The advantage of this approach is that we can trigger an eviction immediately without a delay (was known asacceptDeadlineSeconds) if we do not find an evacuator. I have added a bunch of restrictions to ensure the API cannot be misused.
Clarified how the Evacuation objects should be named and how the admission should work in general (also for pods). This will ensure a 1:1 mapping between pods and Evacuation.
Removed the ability to add a full reference of the evacuator because it would be a hassle to synchronize considering the evacuator leader election and multiple evacuatorsin play.

sftim · 2024-04-09T18:21:41Z

keps/sig-apps/4563-evacuation-api/README.md

+
+Example evacuation triggers:
+- Node maintenance controller: node maintenance triggered by an admin.
+- Descheduler: descheduling triggered by a descheduling rule.


If the descheduler requests an eviction, what thing is being evacuated?

(the node maintenance and cluster autoscaler examples are easier: you're evacuating an entire node)

A single pod or multiple pods. The descheduler can use it as a new mechanism instead of eviction.

OK, so people typically think of “evacuate” as a near synonym of “drain” - you drain a node, you evacuate a rack or zone full of servers. Saying that you can evacuate a Pod might make people think its containers all get stopped, or just confuse readers. We do need to smooth out how we can teach this.

It seems it can be used in both scenarios https://www.merriam-webster.com/grammar/can-you-evacuate-people.
Evacuation of containers doesn't make sense because they are tied to the pod lifecycle. But, I guess it could be confusing if we do not make it explicitly clear what we are targeting.

Thing is, Kubernetes users typically - before we accept this KEP - use “evacuate” as a synonym for drain.

I'm (still) fine with the API idea, and still concerned about the naming.

Just to +1 the potential confusion of the term "evacuation".
Is it okay to have a glossary of terms for "evacuation", "eviction", and "drain" (or any other potentially confusing terms) added somewhere in this KEP?

I can include it in the KEP. And yes, we are going to change the name to something more suitable.

"To evacuate a person" implies "get them out of trouble, to safety" as opposed to "to empty" (as in vacuum). It's not ENTRIELY wrong in this context, but it's not entirely right either.

I will change the name to EvictionRequest. It was originally chosen to distinguish it from eviction, but there is value in making it familiar to existing concepts.

The API is renamed to EvictionRequest now, see #4565 (comment) for more details

keps/sig-apps/4563-evacuation-api/README.md

constraints

indicate that the pod Evacuation is complete if the pod `restartPolicy` allows it.

- move ${POD_UID}-${POD_NAME_PREFIX} format to alternatives

…tions

- better describe Evacuator responsibilities - add new PDB issues - add a follow-up to the beta graduation to track evacuations in pods

keps/sig-apps/4563-eviction-request-api/kep.yaml

- Update the followups. - Add proposal summary. - Add practical use cases to the proposal summary and to the followups. - Deployment Pod Surge Example - HorizontalPodAutoscaler Pod Surge Example - Descheduling and Downscaling All renames for a reference: Evacuation (noun) -> EvictionRequest / Eviction Process evacuation (verb) -> request an eviction / terminate / evict / process eviction Evacuator -> Interceptor Evacuee -> Pod Evacuator Class -> Interceptor Class Evacuation Instigator -> Eviction Requester Evacuation Controller -> Eviction Request Controller ActiveEvacuatorClass -> ActiveInterceptorClass ActiveEvacuatorCompleted -> ActiveInterceptorCompleted EvacuationProgressTimestamp -> ProgressTimestamp ExpectedEvacuationFinishTime -> ExpectedInterceptorFinishTime EvacuationCancellationPolicy -> EvictionRequestCancellationPolicy FailedEvictionCounter -> FailedAPIEvictionCounter

atiratree · 2024-12-03T20:57:50Z

KEP Update; notable changes:

Evacuation has been renamed to EvictionRequest and Evacuator to Interceptor.
The EvictionRequest is considered complete when the pod terminates (phase Succeeded or Failed). The pod does not need to be deleted if the restartPolicy allows it.
Improved proposal summary + many small improvements.
EvictionRequest name should equal to the pod UID.
Increased constraints for pod interceptor annotations.
Updated issues.

Added practical use cases/example scenarios:

atiratree · 2024-12-03T21:37:54Z

I am out of time for now, but I have to admit I am worried about how this will play out. Coordinating between N arbitrary actors via the API is not somethign we really do anywhere else (AFAIK "classic" DRA was the one place and that is being removed).

Can we pin down some very concrete examples of what REAL evacuators might really do, and how the lifecycle works, especially WRT cancellation of an eviction? You can lose the details and just tell a story. This is significantly complicated and I don't feel able to defend it right now.

@thockin, in the workloads space there are cross-cutting concern issues where such coordination is necessary (PTAL at examples above).

For example, in point 3 of the Motivation section we see the need to coordinate the ReplicaSet controller (most likely also the Deployment controller), the HPA and the descheduler (or similar component) just on the interceptor side. And this does not take into account other custom interceptors that users may want to implement.
In the EvictionRequest Cancellation Examples section, we can also see the need for coordination between eviction requesters.

Could this be positioned like "readiness gates". That sort of feels similar.

Such deletion gates could work in a system with well-meaning and diligent actors, but difficult to achieve in practice:

One of the main goals is not just to help application developers and application-oriented components to implement their use cases. But to also to help cluster administrators to detect misbehaving actors and have a well-defined path to achieve pod termination. So this is why we require a proof of work (status update) to prevent people from using it as just another PDB that blocks cluster upgrades.
As above, it is difficult to achieve coordination between multiple actors through the use of gates.

soltysh

I've read through the initial requirements and assumptions, I haven't read carefully through the Eviction protocol and the contract described in the latter part of this document.

soltysh · 2024-12-09T15:04:35Z