Support scaling down a particular node with graceful termination #5187

liuxintong · 2022-09-13T02:51:08Z

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces a new feature to support scaling down a particular node. As described in #5109, some nodes could become non-functional for some reason specific to the cloud provider. With this change, users can tag a node explicitly by running kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-requested=true. Cluster-Autoscaler will perform all required safe checks, evict hosted pods and delete the node.

This PR also introduces the feature to scale up the cluster if the current node group size is smaller than the configured min size. This feature is disabled by default, and users need to pass in the bool flag scale-up-to-meet-node-group-min-size-enabled to enable it.

The first feature is slightly depend on the second feature. For example, the min size of a node group is 3, and it has exactly 3 nodes at this moment. If we tag a node to scale down, it wouldn't work because the node group is already at the min size, and the cluster could only be scaled up if it has unschedulable pods to help. Therefore, we need this change to take the scale-down-requested annotation into account and scale up the node group a bit if it's needed.

Which issue(s) this PR fixes:

Fixes #5109

Special notes for your reviewer:

Besides unit tests, this PR has been fully validated in an Azure K8s cluster. I will share more details about my experiments in a separate comment of this PR.

Does this PR introduce a user-facing change?

Support scaling down a particular node if it has `cluster-autoscaler.kubernetes.io/scale-down-requested=true` annotation.
Support scaling up the cluster if the current node group size is smaller than the configured min size (disabled by default).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [FAQ How can I request Clsuter Autoscaler to scale down a particular node?]: https://github.com/liuxintong/autoscaler/blob/7fe865bc03b2e9feab9254227744da97049716fb/cluster-autoscaler/FAQ.md#how-can-i-request-clsuter-autoscaler-to-scale-down-a-particular-node
- [FAQ My cluster is below minimum / above maximum number of nodes, but CA did not fix that! Why?]: https://github.com/liuxintong/autoscaler/blob/7fe865bc03b2e9feab9254227744da97049716fb/cluster-autoscaler/FAQ.md#my-cluster-is-below-minimum--above-maximum-number-of-nodes-but-ca-did-not-fix-that-why
- [FAQ What are the parameters to CA?]: https://github.com/liuxintong/autoscaler/blob/7fe865bc03b2e9feab9254227744da97049716fb/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca

linux-foundation-easycla · 2022-09-13T02:51:12Z

The committers listed above are authorized under a signed CLA.

✅ login: liuxintong / name: Xintong Liu (f7c0876, 45071c6)

k8s-ci-robot · 2022-09-13T02:51:16Z

Welcome @liuxintong!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

liuxintong · 2022-09-13T04:17:01Z

All changes in this PR have been verified in the following AKS cluster xint-t0910-aks. This cluster has 1 node group agentpool, and the underlying group name in the cloud provider side is aks-agentpool-22992539-vmss00001o.

(base) ➜  ~ kubectl get nodes -o wide
NAME                                STATUS   ROLES   AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
aks-agentpool-22992539-vmss00001o   Ready    agent   138m   v1.24.3   10.224.0.5    <none>        Ubuntu 18.04.6 LTS   5.4.0-1089-azure   containerd://1.6.4+azure-4
aks-agentpool-22992539-vmss00001t   Ready    agent   71m    v1.24.3   10.224.0.6    <none>        Ubuntu 18.04.6 LTS   5.4.0-1089-azure   containerd://1.6.4+azure-4
aks-agentpool-22992539-vmss00001w   Ready    agent   48m    v1.24.3   10.224.0.9    <none>        Ubuntu 18.04.6 LTS   5.4.0-1089-azure   containerd://1.6.4+azure-4

The autoscaler in AKS side was turned off intentionally, and a test cluster-autoscaler container image was generated with Dockerfile.amd64, uploaded to Azure Container Registry xintcr and deployed to kube-system namespace via the following command.

(base) ➜  cluster-autoscaler git:(0912-ca-exp) ✗ make container-arch-amd64 && docker push xintcr.azurecr.io/cluster-autoscaler-amd64:dev0911 && kubectl apply -f cloudprovider/azure/examples/cluster-autoscaler-vmss-control-plane-xint.yaml

By the way, here was the main container image spec define in cluster-autoscaler-vmss-control-plane-xint.yaml. The node group min size was 3, and the max size was 10. The new feature --scale-up-to-meet-node-group-min-size-enabled was enabled.

      containers:
        - image: xintcr.azurecr.io/cluster-autoscaler-amd64:dev0911
          imagePullPolicy: Always
          name: cluster-autoscaler
          command:
            - ./cluster-autoscaler
            - --v=5
            - --logtostderr=true
            - --cloud-provider=azure
            - --skip-nodes-with-local-storage=false
            - --nodes=3:10:aks-agentpool-22992539-vmss
            - --scale-up-to-meet-node-group-min-size-enabled

Once the cluster-autoscaler test instance was running in the cluster, the node aks-agentpool-22992539-vmss00001o was annotated with scale-down-requested via kubectl.

(base) ➜  ~ kubectl annotate node aks-agentpool-22992539-vmss00001o cluster-autoscaler.kubernetes.io/scale-down-requested=true

The next step was checking the container logs periodically. In the first round, the node group was scaled up from 3 to 4, although it has no unschedulable pods. This is by design, because we need a new surging node to scale down the tagged node.

I0912 19:52:17.182221       1 static_autoscaler.go:445] No unschedulable pods

I0912 19:52:17.182290       1 scale_up.go:669] ScaleUpToMeetNodeGroupMinSize: increased desired min size as node aks-agentpool-22992539-vmss00001o has scale-down-requested annotation
I0912 19:52:17.182390       1 scale_up.go:679] ScaleUpToMeetNodeGroupMinSize: NodeGroup aks-agentpool-22992539-vmss: TargetSize 3, DesiredMinSize 4, MinSize 3, MaxSize 10
I0912 19:52:17.182395       1 scale_up.go:698] ScaleUpToMeetNodeGroupMinSize: final scale-up plan: [{aks-agentpool-22992539-vmss 3->4 (max: 10)}]

In the second round, the node with scale-down-requested annotation was included in the scale down candidates. However, the scale down didn't perform, because scaleDownInCooldown is true, which means the cluster could only be scaled down after the period of scale-down-delay-after-add (the default value is 10 minutes).

I0912 19:52:29.070169       1 scale_up.go:669] ScaleUpToMeetNodeGroupMinSize: increased desired min size as node aks-agentpool-22992539-vmss00001o has scale-down-requested annotation
I0912 19:52:29.070289       1 scale_up.go:679] ScaleUpToMeetNodeGroupMinSize: NodeGroup aks-agentpool-22992539-vmss: TargetSize 4, DesiredMinSize 4, MinSize 3, MaxSize 10
I0912 19:52:29.070294       1 scale_up.go:694] ScaleUpToMeetNodeGroupMinSize: scale up not needed

I0912 19:52:29.070312       1 static_autoscaler.go:515] Calculating unneeded nodes
I0912 19:52:29.070368       1 pre_filtering_processor.go:64] GetScaleDownCandidates: adding aks-agentpool-22992539-vmss00001o as it has scale-down-requested annotation

I0912 19:52:29.070487       1 legacy.go:379] Node aks-agentpool-22992539-vmss00001o is requested to be scaled down
I0912 19:52:29.070682       1 legacy.go:452] Finding additional 3 candidates for scale down.
I0912 19:52:29.070696       1 cluster.go:160] aks-agentpool-22992539-vmss00001o for removal
I0912 19:52:29.070764       1 cluster.go:247] Looking for place for kube-system/konnectivity-agent-59c5567b84-5bw8v
I0912 19:52:29.070829       1 cluster.go:266] Pod kube-system/konnectivity-agent-59c5567b84-5bw8v can be moved to aks-agentpool-22992539-vmss00001t
I0912 19:52:29.070850       1 cluster.go:247] Looking for place for kube-system/coredns-6856d58c9d-9cs68
I0912 19:52:29.070909       1 cluster.go:266] Pod kube-system/coredns-6856d58c9d-9cs68 can be moved to aks-agentpool-22992539-vmss00001w
I0912 19:52:29.070926       1 cluster.go:247] Looking for place for kube-system/metrics-server-69559866b8-cjjlx
I0912 19:52:29.070981       1 cluster.go:266] Pod kube-system/metrics-server-69559866b8-cjjlx can be moved to aks-agentpool-22992539-vmss00001t
I0912 19:52:29.070996       1 cluster.go:182] node aks-agentpool-22992539-vmss00001o may be removed

I0912 19:52:29.071583       1 legacy.go:517] aks-agentpool-22992539-vmss00001o is unneeded since 2022-09-12 19:52:29.014669634 +0000 UTC m=+1126.492898830 duration 0s
I0912 19:52:29.071600       1 legacy.go:517] aks-agentpool-22992539-vmss00001t is unneeded since 2022-09-12 19:52:29.014669634 +0000 UTC m=+1126.492898830 duration 0s
I0912 19:52:29.071609       1 legacy.go:517] aks-agentpool-22992539-vmss00001w is unneeded since 2022-09-12 19:52:29.014669634 +0000 UTC m=+1126.492898830 duration 0s
I0912 19:52:29.071629       1 static_autoscaler.go:558] Scale down status: lastScaleUpTime=2022-09-12 19:52:17.181240736 +0000 UTC m=+1114.659469832 lastScaleDownDeleteTime=2022-09-12 19:44:15.051545164 +0000 UTC m=+632.529774260 lastScaleDownFailTime=2022-09-12 18:34:02.514967734 +0000 UTC m=-3580.006803070 scaleDownForbidden=false scaleDownInCooldown=true

In the next round after the scale down cooldown period, the node aks-agentpool-22992539-vmss00001o was being removed as expected. The cluster had more than 1 candidate, but the node with scale-down-requested has higher priority to be scaled down.

I0912 20:02:21.629980       1 legacy.go:517] aks-agentpool-22992539-vmss00001x is unneeded since 2022-09-12 19:59:40.871272841 +0000 UTC m=+1558.349501937 duration 2m40.75584183s
I0912 20:02:21.629990       1 legacy.go:517] aks-agentpool-22992539-vmss00001o is unneeded since 2022-09-12 19:52:29.014669634 +0000 UTC m=+1126.492898830 duration 9m52.612444937s
I0912 20:02:21.630000       1 legacy.go:517] aks-agentpool-22992539-vmss00001t is unneeded since 2022-09-12 19:59:40.871272841 +0000 UTC m=+1558.349501937 duration 2m40.75584183s
I0912 20:02:21.630015       1 legacy.go:517] aks-agentpool-22992539-vmss00001w is unneeded since 2022-09-12 19:59:40.871272841 +0000 UTC m=+1558.349501937 duration 2m40.75584183s
I0912 20:02:21.630036       1 static_autoscaler.go:558] Scale down status: lastScaleUpTime=2022-09-12 19:52:17.181240736 +0000 UTC m=+1114.659469832 lastScaleDownDeleteTime=2022-09-12 19:44:15.051545164 +0000 UTC m=+632.529774260 lastScaleDownFailTime=2022-09-12 18:34:02.514967734 +0000 UTC m=-3580.006803070 scaleDownForbidden=false scaleDownInCooldown=false
I0912 20:02:21.630067       1 static_autoscaler.go:567] Starting scale down

I0912 20:02:21.630121       1 legacy.go:635] aks-agentpool-22992539-vmss00001x was unneeded for 2m40.75584183s
I0912 20:02:21.630171       1 legacy.go:635] aks-agentpool-22992539-vmss00001o was unneeded for 9m52.612444937s
I0912 20:02:21.630224       1 legacy.go:669] Including aks-agentpool-22992539-vmss00001o - node has scale-down-requested annotation
I0912 20:02:21.630230       1 legacy.go:635] aks-agentpool-22992539-vmss00001t was unneeded for 2m40.75584183s
I0912 20:02:21.630283       1 legacy.go:635] aks-agentpool-22992539-vmss00001w was unneeded for 2m40.75584183s

I0912 20:02:21.630363       1 cluster.go:160] aks-agentpool-22992539-vmss00001o for removal
I0912 20:02:21.630407       1 cluster.go:247] Looking for place for kube-system/konnectivity-agent-59c5567b84-5bw8v
I0912 20:02:21.630456       1 cluster.go:251] Pod kube-system/konnectivity-agent-59c5567b84-5bw8v can be moved to aks-agentpool-22992539-vmss00001t
I0912 20:02:21.630478       1 cluster.go:247] Looking for place for kube-system/coredns-6856d58c9d-9cs68
I0912 20:02:21.630509       1 cluster.go:251] Pod kube-system/coredns-6856d58c9d-9cs68 can be moved to aks-agentpool-22992539-vmss00001w
I0912 20:02:21.630536       1 cluster.go:182] node aks-agentpool-22992539-vmss00001o may be removed
I0912 20:02:21.698354       1 delete.go:103] Successfully added ToBeDeletedTaint on node aks-agentpool-22992539-vmss00001o

I0912 20:02:21.698467       1 actuator.go:194] Scale-down: removing node aks-agentpool-22992539-vmss00001o, utilization: {0.11139896373056994 0.02460904152806767 0 cpu 0.11139896373056994}, pods to reschedule: konnectivity-agent-59c5567b84-5bw8v,coredns-6856d58c9d-9cs68

I0912 20:02:32.143914       1 drain.go:151] All pods removed from aks-agentpool-22992539-vmss00001o
I0912 20:02:32.144060       1 azure_scale_set.go:351] Deleting vmss instances [azure:///subscriptions/3b96dd57-b968-4e2b-8ad7-43bf473caf64/resourceGroups/mc_xint-t0910_xint-t0910-aks_westus/providers/Microsoft.Compute/virtualMachineScaleSets/aks-agentpool-22992539-vmss/virtualMachines/60]
I0912 20:02:32.144185       1 azure_scale_set.go:401] Calling virtualMachineScaleSetsClient.DeleteInstancesAsync(&[60])
I0912 20:02:32.265154       1 azure_scale_set.go:184] Calling virtualMachineScaleSetsClient.WaitForDeleteInstancesResult(&[60]) for aks-agentpool-22992539-vmss

In addition, after deploying the test container image, all pods in kube-system namespace had been running successfully without any restarts.

(base) ➜  ~ kubectl get pods -o wide -n kube-system
NAME                                  READY   STATUS    RESTARTS   AGE     IP            NODE                                NOMINATED NODE   READINESS GATES
azure-ip-masq-agent-4c28k             1/1     Running   0          9h      10.224.0.6    aks-agentpool-22992539-vmss00001t   <none>           <none>
azure-ip-masq-agent-j5ts6             1/1     Running   0          8h      10.224.0.9    aks-agentpool-22992539-vmss00001w   <none>           <none>
azure-ip-masq-agent-kk849             1/1     Running   0          7h54m   10.224.0.7    aks-agentpool-22992539-vmss00001y   <none>           <none>
cloud-node-manager-4jdk5              1/1     Running   0          7h54m   10.224.0.7    aks-agentpool-22992539-vmss00001y   <none>           <none>
cloud-node-manager-94kbz              1/1     Running   0          9h      10.224.0.6    aks-agentpool-22992539-vmss00001t   <none>           <none>
cloud-node-manager-gr6q6              1/1     Running   0          8h      10.224.0.9    aks-agentpool-22992539-vmss00001w   <none>           <none>
cluster-autoscaler-85dfdff7c7-gz52v   1/1     Running   0          8h      10.244.68.4   aks-agentpool-22992539-vmss00001w   <none>           <none>
coredns-6856d58c9d-6d9nq              1/1     Running   0          9h      10.244.65.3   aks-agentpool-22992539-vmss00001t   <none>           <none>
coredns-6856d58c9d-qkws6              1/1     Running   0          7h41m   10.244.70.3   aks-agentpool-22992539-vmss00001y   <none>           <none>
coredns-autoscaler-559d556687-vqpch   1/1     Running   0          8h      10.244.68.5   aks-agentpool-22992539-vmss00001w   <none>           <none>
csi-azuredisk-node-jd8kw              3/3     Running   0          7h54m   10.224.0.7    aks-agentpool-22992539-vmss00001y   <none>           <none>
csi-azuredisk-node-nmkkk              3/3     Running   0          8h      10.224.0.9    aks-agentpool-22992539-vmss00001w   <none>           <none>
csi-azuredisk-node-zglfl              3/3     Running   0          9h      10.224.0.6    aks-agentpool-22992539-vmss00001t   <none>           <none>
csi-azurefile-node-9hn6s              3/3     Running   0          7h54m   10.224.0.7    aks-agentpool-22992539-vmss00001y   <none>           <none>
csi-azurefile-node-n6ngs              3/3     Running   0          8h      10.224.0.9    aks-agentpool-22992539-vmss00001w   <none>           <none>
csi-azurefile-node-n8x82              3/3     Running   0          9h      10.224.0.6    aks-agentpool-22992539-vmss00001t   <none>           <none>
konnectivity-agent-59c5567b84-9zs6h   1/1     Running   0          8h      10.244.65.6   aks-agentpool-22992539-vmss00001t   <none>           <none>
konnectivity-agent-59c5567b84-h6lqz   1/1     Running   0          7h41m   10.244.68.7   aks-agentpool-22992539-vmss00001w   <none>           <none>
kube-proxy-65mt6                      1/1     Running   0          9h      10.224.0.6    aks-agentpool-22992539-vmss00001t   <none>           <none>
kube-proxy-crg77                      1/1     Running   0          8h      10.224.0.9    aks-agentpool-22992539-vmss00001w   <none>           <none>
kube-proxy-gj9dh                      1/1     Running   0          7h54m   10.224.0.7    aks-agentpool-22992539-vmss00001y   <none>           <none>
metrics-server-69559866b8-df4sr       2/2     Running   0          7h34m   10.244.68.8   aks-agentpool-22992539-vmss00001w   <none>           <none>
metrics-server-69559866b8-ghlxg       2/2     Running   0          7h34m   10.244.70.4   aks-agentpool-22992539-vmss00001y   <none>           <none>

liuxintong · 2022-09-15T05:16:10Z

@feiskyer / @x13n - Could you help review this PR? Thanks.

liuxintong · 2022-09-16T04:14:46Z

@x13n - I see you are doing some refactoring, please let me know if you have any concern about this feature before I rebase my PR.

x13n · 2022-09-16T09:31:41Z

Hi @liuxintong , thanks for sending the change! I'll try to do a proper review next week. I'm indeed moving quite a lot of scale down logic around, so it will be better to wait until then with merging this PR. One high level comment I have for now is that maybe it makes sense to split this into 2 PRs? Optional enforcement of min size is a feature in itself and doesn't interfere with my changes, so maybe it could be done first.

liuxintong · 2022-09-17T00:44:04Z

Thanks @x13n! Splitting into 2 pull requests also make sense to me. I'll make it in a new PR. Please let me know once your scale down optimization is done, so that I can implement the new feature based on your changes.

MarcPow · 2022-11-09T21:51:26Z

@x13n is the conflicting refactor now complete?

x13n · 2022-11-10T14:58:19Z

Yes it is! The work here shouldn't be blocked on anything now.

liuxintong · 2022-12-03T05:32:54Z

/retest

k8s-ci-robot · 2022-12-03T05:33:08Z

@liuxintong: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liuxintong · 2022-12-03T05:34:32Z

/ok-to-test

k8s-ci-robot · 2022-12-03T05:34:45Z

@liuxintong: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liuxintong · 2022-12-03T05:36:01Z

/test all

k8s-ci-robot · 2022-12-03T05:36:15Z

@liuxintong: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liuxintong · 2022-12-06T09:05:45Z

In addition to unit tests, this PR has been verified in an Azure Kubernetes cluster. The following example shows how CA scales down a node that hosts the cluster-autoscaler pod.

@x13n / @MarcPow / @feiskyer, this PR is ready for review, please help take a look, thanks!

(base) ➜  ~ kubectl get node
NAME                                STATUS   ROLES   AGE     VERSION
aks-agentpool-17091672-vmss00002q   Ready    agent   66m     v1.24.3
aks-agentpool-17091672-vmss00002u   Ready    agent   9m16s   v1.24.3
aks-agentpool-17091672-vmss00002v   Ready    agent   88s     v1.24.3

(base) ➜  ~ kubectl get pod -l app=cluster-autoscaler -n kube-system -o wide
NAME                                  READY   STATUS    RESTARTS   AGE     IP            NODE                                NOMINATED NODE   READINESS GATES
cluster-autoscaler-75c6cfc6bd-6z28d   1/1     Running   0          4m58s   10.244.92.2   aks-agentpool-17091672-vmss00002u   <none>           <none>

(base) ➜  ~ kubectl annotate node aks-agentpool-17091672-vmss00002u cluster-autoscaler.kubernetes.io/scale-down-requested=5
node/aks-agentpool-17091672-vmss00002u annotated

(base) ➜  ~ kubectl logs -n kube-system cluster-autoscaler-75c6cfc6bd-6z28d -f > tmpf.log

I1206 08:13:34.678485       1 static_autoscaler.go:524] Calculating unneeded nodes
I1206 08:13:34.678864       1 eligibility.go:140] Node aks-agentpool-17091672-vmss00002u is removable (cpu utilization 0.155699), because it has scale-down-requested annotation

I1206 08:13:34.679567       1 nodes.go:84] aks-agentpool-17091672-vmss00002v is unneeded since 2022-12-06 08:12:24.397596394 +0000 UTC m=+594.022400670 duration 1m10.279817063s
I1206 08:13:34.679578       1 nodes.go:84] aks-agentpool-17091672-vmss00002q is unneeded since 2022-12-06 08:13:24.649375971 +0000 UTC m=+654.274180147 duration 10.028037586s
I1206 08:13:34.679584       1 nodes.go:84] aks-agentpool-17091672-vmss00002u is unneeded since 2022-12-06 08:08:03.447287777 +0000 UTC m=+333.072092053 duration 5m31.23012568s

I1206 08:13:34.679775       1 static_autoscaler.go:581] Starting scale down
I1206 08:13:34.775157       1 delete.go:103] Successfully added ToBeDeletedTaint on node aks-agentpool-17091672-vmss00002u
I1206 08:13:34.775266       1 actuator.go:212] Scale-down: removing node aks-agentpool-17091672-vmss00002u, utilization: {0.15569948186528498 0.056600795514555644 0 cpu 0.15569948186528498}, pods to reschedule: coredns-59b6bf8b4f-26xrt,coredns-autoscaler-5655d66f64-zcmpn,metrics-server-5f8d84558d-qmrh2,konnectivity-agent-598b769b5c-jfljw,cluster-autoscaler-75c6cfc6bd-6z28d

I1206 08:13:34.775843       1 scale_up.go:485] ScaleUpToNodeGroupMinSize: node aks-agentpool-17091672-vmss00002u in node group aks-agentpool-17091672-vmss has scale-down-requested annotation
I1206 08:13:34.775856       1 scale_up.go:490] ScaleUpToNodeGroupMinSize: NodeGroup aks-agentpool-17091672-vmss, TargetSize 3, MinSize 2, MaxSize 10, ScaleDownRequested 1
I1206 08:13:34.775863       1 scale_up.go:535] ScaleUpToNodeGroupMinSize: scale up not needed

I1206 08:13:39.776594       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/coredns-autoscaler-5655d66f64-zcmpn from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776626       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/metrics-server-5f8d84558d-qmrh2 from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776640       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/cluster-autoscaler-75c6cfc6bd-6z28d from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776695       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/konnectivity-agent-598b769b5c-jfljw from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776627       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/coredns-59b6bf8b4f-26xrt from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776595       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/csi-azuredisk-node-48xng from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776607       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/cloud-node-manager-rtwfq from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776840       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/csi-azurefile-node-njxtc from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776611       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/kube-proxy-dh2pg from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation
I1206 08:13:39.776617       1 drain.go:234] Overriding max graceful termination seconds of pod kube-system/azure-ip-masq-agent-6m6gc from 30 to 5, because node aks-agentpool-17091672-vmss00002u has scale-down-requested annotation

I1206 08:13:40.183285       1 main.go:349] Cleaned up, exiting...

(base) ➜  ~ kubectl get node
NAME                                STATUS     ROLES   AGE   VERSION
aks-agentpool-17091672-vmss00002q   Ready      agent   76m   v1.24.3
aks-agentpool-17091672-vmss00002u   NotReady   agent   19m   v1.24.3
aks-agentpool-17091672-vmss00002v   Ready      agent   11m   v1.24.3
aks-agentpool-17091672-vmss00002w   Ready      agent   33s   v1.24.3

(base) ➜  ~ kubectl get pod -l app=cluster-autoscaler -n kube-system -o wide
NAME                                  READY   STATUS    RESTARTS   AGE     IP            NODE                                NOMINATED NODE   READINESS GATES
cluster-autoscaler-75c6cfc6bd-7f2tc   1/1     Running   0          4m13s   10.244.93.5   aks-agentpool-17091672-vmss00002v   <none>           <none>

(base) ➜  ~ kubectl logs -n kube-system cluster-autoscaler-75c6cfc6bd-7f2tc > tmp.log

I1206 08:14:14.396478       1 delete.go:197] Releasing taint {Key:ToBeDeletedByClusterAutoscaler Value:1670314414 Effect:NoSchedule TimeAdded:<nil>} on node aks-agentpool-17091672-vmss00002u
I1206 08:14:14.489510       1 delete.go:228] Successfully released ToBeDeletedTaint on node aks-agentpool-17091672-vmss00002u

I1206 08:14:14.491200       1 static_autoscaler.go:524] Calculating unneeded nodes
I1206 08:14:14.491426       1 eligibility.go:140] Node aks-agentpool-17091672-vmss00002u is removable (cpu utilization 0.080311), because it has scale-down-requested annotation

I1206 08:14:14.491890       1 static_autoscaler.go:581] Starting scale down
I1206 08:14:14.491976       1 nodes.go:192] Node aks-agentpool-17091672-vmss00002u is removable, because it has scale-down-requested annotation
I1206 08:14:14.591497       1 delete.go:103] Successfully added ToBeDeletedTaint on node aks-agentpool-17091672-vmss00002u
I1206 08:14:14.591543       1 actuator.go:161] Scale-down: removing empty node "aks-agentpool-17091672-vmss00002u"

I1206 08:14:19.815745       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"e1c2d971-7210-40cc-9404-38387d3aa23f", APIVersion:"v1", ResourceVersion:"33886671", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: empty node aks-agentpool-17091672-vmss00002u removed

x13n · 2022-12-19T09:58:57Z

/assign

x13n

I started to review the code and you can see a bunch of my comments on some of the files, but I actually started to doubt this is something we should be adding to Cluster Autoscaler. This isn't really about autoscaling, it is about automatic node repairs. If I understand correctly, the use case here is to manually tag certain broken nodes for removal. This looks like something that could be already achieved by cordoning/draining node with kubectl, followed by VM removal via cloud provider API. The min size enforcement will then kick in, if necessary. WDYT?

x13n · 2022-12-19T10:00:00Z

cluster-autoscaler/FAQ.md

@@ -32,6 +32,7 @@ this document:
  * [How can I see all the events from Cluster Autoscaler?](#how-can-i-see-all-events-from-cluster-autoscaler)
  * [How can I scale my cluster to just 1 node?](#how-can-i-scale-my-cluster-to-just-1-node)
  * [How can I scale a node group to 0?](#how-can-i-scale-a-node-group-to-0)
+  * [How can I request Clsuter Autoscaler to scale down a particular node?](#how-can-i-request-clsuter-autoscaler-to-scale-down-a-particular-node)


Typo (here and in other places): clsuter->cluster.

Fixed all typos.

x13n · 2022-12-19T10:06:18Z

cluster-autoscaler/FAQ.md

+
+```
+kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-requested=30
+kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-requested-


I'd add a note that while one can remove the annotation, it doesn't guarantee the node won't be removed. If Cluster Autoscaler already started draining the node, removing the annotation will have no effect. I think this is an important caveat to document.

Good point. Added the disclaimer.

x13n · 2022-12-19T10:15:51Z

cluster-autoscaler/FAQ.md

+
+Starting with CA 1.26.0, nodes will be evicted by CA if it has the annotation requesting scale-down.
+* The annotation key is `cluster-autoscaler.kubernetes.io/scale-down-requested`.
+* The annotation value is a number representing the max graceful termination seconds for pods hosted on the node.


Would it be possible to rename the annotation so that the meaning of value doesn't require reading the FAQ? I was thinking about something along the lines of cluster-autoscaler.kubernetes.io/enforced-scale-down-graceful-termination-seconds, but this is a bit long, hope you have a better idea :)

Yeah, that makes sense, I renamed it to cluster-autoscaler.kubernetes.io/force-scale-down-with-grace-period-minutes, but I'm not sure if this is a better name.

Btw, I've moved from annotations to taints.

x13n · 2022-12-19T10:24:27Z

cluster-autoscaler/core/scale_up.go

+			continue
+		}
+		if len(groupsWithNodes[ng]) == 0 {
+			groupsWithNodes[ng] = make([]*apiv1.Node, 0)


This is unnecessary, append(nil, node) will already return a single-element slice.

Yes, you are right.

x13n · 2022-12-19T10:43:22Z

cluster-autoscaler/core/scale_up.go

@@ -486,7 +509,7 @@ func ScaleUpToNodeGroupMinSize(context *context.AutoscalingContext, processors *
 			continue
 		}

-		newNodeCount := ng.MinSize() - targetSize
+		newNodeCount := ng.MinSize() + scaleDownRequestedCount - targetSize


I'm a bit worried this will cause nodes to be removed right after adding. Consider the following scenario:

Annotation gets added to a node n1

Scale down starts to consider n1 for deletion

This code triggers a scale up and creates node n2

Annotation gets added to a node n3

Scale down considers n2 instead of n3 for deletion, because it is empty

This code triggers a scale up and creates n4

Scale down removes n1 (unneeded long enough)

Scale down removes n2 (unneeded long enough)

This code has to create another replacement for n3

If there's no special handling of annotated nodes here, the scale up to min is purely reactive, which would cause node count to sometimes go below min, but then recover.

Another problematic scenario:

Node n1 gets annotated

Node n1 starts getting drained

This code triggers creation of a new node n2

Pods evicted from n1 are recreated and manage to schedule on other existing nodes in the cluster

Node n2 becomes ready, but is empty and scale down has to delete it

Thanks for thinking thru all possible scenarios.

Here are some logics we have added to avoid node churning caused by the force-scale-down nodes:

The scale-up will be triggered only if the pods on the force-scale-down node cannot be rescheduled on existing nodes.

The scale-down will be triggered only if the node is still unneeded after rescheduling all pods from the force-scale-down nodes.

If the scale-down candidates have multiple nodes, the force-scale-down node will have higher priority.

x13n · 2022-12-19T10:48:58Z

cluster-autoscaler/core/scaledown/actuation/drain.go

@@ -224,6 +225,19 @@ func evictPod(ctx *acontext.AutoscalingContext, podToEvict *apiv1.Pod, isDaemonS
 		}
 	}

+	if utils.HasScaleDownRequestedAnnotation(node) {


The logic to calculate maxTermination becomes quite complicated with this change, please extract it to a separate function.

Agreed. I've moved all draining related logic to k8s.io/autoscaler/cluster-autoscaler/simulator/drainability/rules/forcescaledown.

Btw, the drainability rule is a nice refactoring in the recent year.

x13n · 2022-12-19T10:55:10Z

cluster-autoscaler/core/scaledown/eligibility/eligibility.go

@@ -135,6 +131,16 @@ func (c *Checker) unremovableReasonAndNodeUtilization(context *context.Autoscali
 		return simulator.NotAutoscaled, nil
 	}

+	utilInfo, err := utilization.Calculate(nodeInfo, context.IgnoreDaemonSetsUtilization, context.IgnoreMirrorPodsUtilization, context.CloudProvider.GPULabel(), timestamp)


nit: Why move this?

I've refactored this part in the new iteration. We only have very few changes in this file.

x13n · 2023-02-13T12:58:16Z

Given my reasoning above and lack of activity here, I'm going to close this PR. Please reopen if you disagree.

/close

k8s-ci-robot · 2023-02-13T12:58:21Z

@x13n: Closed this PR.

In response to this:

Given my reasoning above and lack of activity here, I'm going to close this PR. Please reopen if you disagree.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liuxintong · 2024-03-25T06:53:59Z

Reopening as I'm currently working on it.
/reopen

k8s-ci-robot · 2024-03-25T06:54:03Z

@liuxintong: Reopened this PR.

In response to this:

Reopening as I'm currently working on it.
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2024-03-25T06:54:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liuxintong
Once this PR has been reviewed and has the lgtm label, please ask for approval from x13n. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

x13n · 2024-03-27T09:09:01Z

Hi @liuxintong ! Can you clarify what is the benefit of having this logic in autoscaler? IIUC the user will observe more or less the same behavior by annotating the pod as if they just did kubectl drain.

liuxintong

@x13n - I was going to reply to your previous comments, but I didn't get any time until today. Apologized for the delay.

Hi @liuxintong ! Can you clarify what is the benefit of having this logic in autoscaler?

Technically, we can developer a new cluster controller to meet all the business needs. But it would have lots of duplicate logics, just like what we have here. Then we also need to resolve scale-down conflicts between the new controller and cluster-autoscaler. CA already has a mature implementation in this area, like scale-down simulation, node draining rules, cloud provider integration, etc. I'd like to leverage it to achieve the goal of scaling down a specific node.

Another motivation is that new node provisioning takes longer on Windows than on Linux, and we want to reduce the pod pending time to minimize the impact on services. When we need to scale down a node, we can scale up a new node at the same time. Then the pods can be moved from the old node to the new node as quickly as possible.

IIUC the user will observe more or less the same behavior by annotating the pod as if they just did kubectl drain.

You are right if we only have a few clusters to operate. However, we mange thousands of clusters. This simple problem gets complicated when the scale is up.

For the 2 options you mentioned, I think the main differences are as follows:

kubectl annotate/kubectl taint: it returns immediately, and we can guarantee success with the logic in CA.
kubectl drain: we need to track the execution externally, and the node draining might be failed (no guarantee).

In addition, all related code logics are controlled by the new flag --force-scale-down-enabled, and the default value is false. If others don't need it, they won't feel any difference.

@MarcPow also shared more context in Issue #5109. Please let us know if you have any additional concerns. Thank you!

liuxintong · 2024-03-28T05:25:49Z

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go

-// than the configured min size. The source of truth for the current node group
-// size is the TargetSize queried directly from cloud providers. Returns
+// than the required min size, which is calculated based on the node group min
+// size configuration and the number of force-scale-down tainted nodes. Returns
 // appropriate status or error if an unexpected error occurred.
 func (o *ScaleUpOrchestrator) ScaleUpToNodeGroupMinSize(


@kisieland / @BigDarkClown - Thanks for reviewing PR #5663. I'm fixing Issue #5624 here, please take another look.

FYI: @mwielgus

liuxintong · 2024-03-28T06:05:34Z

cluster-autoscaler/FAQ.md

@@ -32,6 +32,7 @@ this document:
  * [How can I see all the events from Cluster Autoscaler?](#how-can-i-see-all-events-from-cluster-autoscaler)
  * [How can I scale my cluster to just 1 node?](#how-can-i-scale-my-cluster-to-just-1-node)
  * [How can I scale a node group to 0?](#how-can-i-scale-a-node-group-to-0)
+  * [How can I request Clsuter Autoscaler to scale down a particular node?](#how-can-i-request-clsuter-autoscaler-to-scale-down-a-particular-node)


Fixed all typos.

liuxintong · 2024-03-28T06:06:20Z

cluster-autoscaler/FAQ.md

+
+```
+kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-requested=30
+kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-requested-


Good point. Added the disclaimer.

liuxintong · 2024-03-28T06:06:38Z

cluster-autoscaler/FAQ.md

+
+Starting with CA 1.26.0, nodes will be evicted by CA if it has the annotation requesting scale-down.
+* The annotation key is `cluster-autoscaler.kubernetes.io/scale-down-requested`.
+* The annotation value is a number representing the max graceful termination seconds for pods hosted on the node.


Yeah, that makes sense, I renamed it to cluster-autoscaler.kubernetes.io/force-scale-down-with-grace-period-minutes, but I'm not sure if this is a better name.

Btw, I've moved from annotations to taints.

liuxintong · 2024-03-28T06:06:46Z

cluster-autoscaler/core/scale_up.go

+			continue
+		}
+		if len(groupsWithNodes[ng]) == 0 {
+			groupsWithNodes[ng] = make([]*apiv1.Node, 0)


Yes, you are right.

liuxintong · 2024-03-28T06:07:09Z

cluster-autoscaler/core/scale_up.go

@@ -486,7 +509,7 @@ func ScaleUpToNodeGroupMinSize(context *context.AutoscalingContext, processors *
 			continue
 		}

-		newNodeCount := ng.MinSize() - targetSize
+		newNodeCount := ng.MinSize() + scaleDownRequestedCount - targetSize


Thanks for thinking thru all possible scenarios.

Here are some logics we have added to avoid node churning caused by the force-scale-down nodes:

The scale-up will be triggered only if the pods on the force-scale-down node cannot be rescheduled on existing nodes.

The scale-down will be triggered only if the node is still unneeded after rescheduling all pods from the force-scale-down nodes.

If the scale-down candidates have multiple nodes, the force-scale-down node will have higher priority.

liuxintong · 2024-03-28T06:07:23Z

cluster-autoscaler/core/scaledown/actuation/drain.go

@@ -224,6 +225,19 @@ func evictPod(ctx *acontext.AutoscalingContext, podToEvict *apiv1.Pod, isDaemonS
 		}
 	}

+	if utils.HasScaleDownRequestedAnnotation(node) {


Agreed. I've moved all draining related logic to k8s.io/autoscaler/cluster-autoscaler/simulator/drainability/rules/forcescaledown.

Btw, the drainability rule is a nice refactoring in the recent year.

liuxintong · 2024-03-28T06:07:36Z

cluster-autoscaler/core/scaledown/eligibility/eligibility.go

@@ -135,6 +131,16 @@ func (c *Checker) unremovableReasonAndNodeUtilization(context *context.Autoscali
 		return simulator.NotAutoscaled, nil
 	}

+	utilInfo, err := utilization.Calculate(nodeInfo, context.IgnoreDaemonSetsUtilization, context.IgnoreMirrorPodsUtilization, context.CloudProvider.GPULabel(), timestamp)


I've refactored this part in the new iteration. We only have very few changes in this file.

Bryce-Soghigian · 2024-03-28T17:49:23Z

I still don't see the benefit over just kubectl drain

x13n · 2024-03-29T07:17:50Z

If the only difference between `kubectl drain` and `kubectl annotate` is the need to wait for actuation, then perhaps kubernetes/enhancements#4212 is going to address this use case better? I think once this KEP lands, Cluster Autoscaler should start relying on it as well.

sftim · 2024-03-29T15:03:49Z

If the only difference between kubectl drain and kubectl annotate is the need to wait for actuation, then perhaps kubernetes/enhancements#4212 is going to address this use case better?

That KEP provides a better authorization story; we can allow users (and controllers) to request node drains, and we can allow a controller to trigger draining even without allowing it to write to a node; labelling nodes can break expectations around workload isolation.

@liuxintong if you're willing to contribute to defining that KEP, I think it provides a good way forward. The work you've done on this PR can help ensure that the cluster autoscaler is ready for the arrival of declarative node drains.

k8s-ci-robot · 2024-05-16T17:33:32Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

x13n · 2024-06-06T09:00:31Z

I'm closing this one due to inactivity. Looks like long term we can depend on declarative node maintenance for this use case.

/close

k8s-ci-robot · 2024-06-06T09:00:36Z

@x13n: Closed this PR.

In response to this:

I'm closing this one due to inactivity. Looks like long term we can depend on declarative node maintenance for this use case.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 13, 2022

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Sep 13, 2022

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 13, 2022

k8s-ci-robot requested review from feiskyer and x13n September 13, 2022 02:52

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 13, 2022

liuxintong mentioned this pull request Sep 13, 2022

Support explicitly tagging a node for safe eviction and removal #5109

Closed

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2022

liuxintong mentioned this pull request Sep 17, 2022

Support scaling up node groups to the configured min size if needed #5195

Merged

jbartosik added the area/cluster-autoscaler label Sep 26, 2022

liuxintong force-pushed the 0912-ca-exp branch from abbaeaa to dbf2a54 Compare December 3, 2022 03:19

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2022

liuxintong changed the title ~~Support explicitly tagging a node for safe eviction and removal~~ Support scaling down a particular node with graceful termination Dec 3, 2022

liuxintong force-pushed the 0912-ca-exp branch from dbf2a54 to b68ca85 Compare December 3, 2022 06:59

liuxintong force-pushed the 0912-ca-exp branch from b68ca85 to f8219c1 Compare December 6, 2022 08:35

k8s-ci-robot assigned x13n Dec 19, 2022

x13n reviewed Dec 19, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2023

k8s-ci-robot closed this Feb 13, 2023

k8s-ci-robot reopened this Mar 25, 2024

Support scaling down a particular node with graceful termination

f7c0876

liuxintong force-pushed the 0912-ca-exp branch from f8219c1 to f7c0876 Compare March 25, 2024 07:21

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 25, 2024

Address comments

45071c6

liuxintong commented Mar 28, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2024

k8s-ci-robot closed this Jun 6, 2024

Support scaling down a particular node with graceful termination #5187

Support scaling down a particular node with graceful termination #5187

Conversation

liuxintong commented Sep 13, 2022

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

linux-foundation-easycla bot commented Sep 13, 2022 • edited Loading

k8s-ci-robot commented Sep 13, 2022

liuxintong commented Sep 13, 2022

liuxintong commented Sep 15, 2022

liuxintong commented Sep 16, 2022

x13n commented Sep 16, 2022

liuxintong commented Sep 17, 2022

MarcPow commented Nov 9, 2022

x13n commented Nov 10, 2022

liuxintong commented Dec 3, 2022

k8s-ci-robot commented Dec 3, 2022

liuxintong commented Dec 3, 2022

k8s-ci-robot commented Dec 3, 2022

liuxintong commented Dec 3, 2022

k8s-ci-robot commented Dec 3, 2022

liuxintong commented Dec 6, 2022

x13n commented Dec 19, 2022

x13n left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

x13n commented Feb 13, 2023

k8s-ci-robot commented Feb 13, 2023

liuxintong commented Mar 25, 2024

k8s-ci-robot commented Mar 25, 2024

k8s-ci-robot commented Mar 25, 2024

x13n commented Mar 27, 2024

liuxintong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bryce-Soghigian commented Mar 28, 2024

x13n commented Mar 29, 2024 via email

sftim commented Mar 29, 2024

k8s-ci-robot commented May 16, 2024

x13n commented Jun 6, 2024

k8s-ci-robot commented Jun 6, 2024

linux-foundation-easycla bot commented Sep 13, 2022 •

edited

Loading