cluster-autoscaler CAPI scale up when current number of nodes is less than minSize #5267

MaxFedotov · 2022-10-21T14:57:49Z

Which component are you using?: Cluster Autoscaler with Cluster API

What version of the component are you using?:

Component version: v1.25.0

What k8s version are you using (kubectl version)?: 1.23.7

kubectl version Output

$ kubectl version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.7", GitCommit:"42c05a547468804b2053ecf60a3bd15560362fc2", GitTreeState:"clean", BuildDate:"2022-05-24T12:30:55Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.7", GitCommit:"42c05a547468804b2053ecf60a3bd15560362fc2", GitTreeState:"clean", BuildDate:"2022-05-24T12:24:41Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?: clusterapi controlled cluster

What did you expect to happen?:
We are using cluster-autoscaler with Cluster API ClusterClass and Managed Topologies feature. According to documentation, the user needs to omit replicas field in spec.topology.workers.machineDeployments[] in order to avoid conflict between cluster-autoscaler and topology controller. When replicas field is empty by default a single node is provisioned for each MachineDeployment.

Our intention was to use cluster-autoscaler to set a default minimum number of replicas for MachineDeployments, so we added

  annotations:
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "3"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "3"

annotations and were expecting to see 2 additional nodes to be provisioned.

What happened instead?:

According to this code

autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go

Lines 358 to 361 in e6ff526

    
           // Ensure the node group would have the capacity to scale 
        
           if scalableResource.MaxSize()-scalableResource.MinSize() < 1 { 
        
           	return nil, nil 
        
           }

cluster-autoscaler for CAPI skips MachineDeployment where maxSize = minSize, even if the current number of replicas is less the specified minSize.

It seems a bit counterintuitive because the current size of the MachineDeployment is less than minSize specified by user, so there is definitely a capacity to scale till minSize is reached.

The text was updated successfully, but these errors were encountered:

MaxFedotov · 2022-10-25T18:44:29Z

@elmiko, can I please ask you to take a look, as one of the CAPI cluster-autoscaler maintainers

elmiko · 2022-10-26T13:56:37Z

@MaxFedotov apologies for the delay, i am traveling at kubecon this week.

i have a couple thoughts about this:

this is known behavior in the autoscaler, if your node group is below the minimum, or above the maximum, the autoscaler will not attempt to reach the minimum or maximum, respectively.
there is currently some strange interactions between the ClusterClass and autoscaler with regards to how the replicas field should be set. in this scenario, the replicas in the MachineDeployment should be "3" but because ClusterClass needs to have the replicas empty to make the autoscaling portions work, this is problematic. cc @sbueringer knows a little more about this, and there is a related issue in cluster-api, see MachineDeployment.spec.replicas defaulting should take into account autoscaler min/max size if defined kubernetes-sigs/cluster-api#7293

in this specific case though, i think you could make the replicas 3 in your ClusterClass and it should work because you are using min/max of 3 as well. since the ClusterClass controller will want to keep the replicas at 3 and the autoscaler will never attempt to grow or shrink, it should be fine. but this only works because you've got this specific MachineDeployment locked at 3 nodes.

this is definitely an issue we need to solve with respect to the way ClusterClass works.

MaxFedotov · 2022-10-27T08:59:56Z

@elmiko Thanks a lot for the explanations!

in this specific case though, i think you could make the replicas 3 in your ClusterClass and it should work because you are using min/max of 3 as well. since the ClusterClass controller will want to keep the replicas at 3 and the autoscaler will never attempt to grow or shrink, it should be fine.

The thing is that in this case I won't be able in future to use autoscaler with this nodegroup :(

My initial intention was to offload all nodegroup size management to cluster-autoscaler, but seems like it is impossible for now, because I need to perform the initial nodegroup scale-up, so it reaches a specific start size.

elmiko · 2022-10-27T14:49:48Z

The thing is that in this case I won't be able in future to use autoscaler with this nodegroup :(

yeah, true =(

My initial intention was to offload all nodegroup size management to cluster-autoscaler, but seems like it is impossible for now, because I need to perform the initial nodegroup scale-up, so it reaches a specific start size.

it sounds like it won't work for you case currently, imo we need solve the issue in the cluster-api community around how this works with ClusterClasses.

k8s-triage-robot · 2023-01-25T15:15:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

elmiko · 2023-01-25T15:35:10Z

i think there is some interesting discussion happening related to this in kubernetes-sigs/cluster-api#7293 (comment)

/remove-lifecycle stale

MaxFedotov · 2023-03-01T18:17:22Z

Hi @elmiko! Want to go back to this issue after kubernetes-sigs/cluster-api#7990 was merged.
If I understand correctly, now it is possible to add a new machineDeployment to managed Cluster and specify cluster-autoscaler annotations in spec like:

    workers:
      machineDeployments:
      - class: default-worker
        metadata:
          annotations:
            cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "4"
            cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "2"
        name: md-1

and defaulting webhook will set replicas field of created machineDeployment to a value specified in cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size annotation.

But if I will update this field, for example, set cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "3" in Cluster spec, defaulting webhook will not update replicas field, due to this check

https://github.com/kubernetes-sigs/cluster-api/blob/6fede8c8cbc83e108aa6a5bc4507ed3aae096105/api/v1beta1/machinedeployment_webhook.go#L307-L311

So for now, if I want to increase a min (or in this case it will be current) number of nodes in an already created nodegroup, I had to update cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size annotation and also remove replicas field for a corresponding machineDeployment.

Is there any way (or maybe some plans) to allow users to control min number of nodes in a nodegroup with only cluster-autoscaler annotations?

Thanks!

elmiko · 2023-03-01T19:56:49Z

So for now, if I want to increase a min (or in this case it will be current) number of nodes in an already created nodegroup, I had to update cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size annotation and also remove replicas field for a corresponding machineDeployment.

yeah, i can see how this would be frustrating and possibly brittle. imo, it seems like we might want to propose logic on the cluster-api controllers that could automatically update the replicas if it sees the minimum annotation increased. i'm not sure how the community would feel about that approach, but it's the first thing that came to mind. i have a feeling that reaching across the api boundaries between autoscaler and cluster-api might be confusing if we couple too tightly.

Is there any way (or maybe some plans) to allow users to control min number of nodes in a nodegroup with only cluster-autoscaler annotations?

i haven't heard anything like that, but sounds like an interesting idea.

MaxFedotov · 2023-03-01T20:20:35Z

The interesting thing is that this check

func calculateMachineDeploymentReplicas(ctx context.Context, oldMD *MachineDeployment, newMD *MachineDeployment, dryRun bool) (int32, error) {
	// If replicas is already set => Keep the current value.
	if newMD.Spec.Replicas != nil {
		return *newMD.Spec.Replicas, nil
	}

completely prevents an ability to use cluster-autoscaler annotations when using managed Cluster topologies and machineDeployment is already created. Because if machineDeployments are managed by topology controller and workers .machineDeployments .[].metadata .annotations are updated it generates a Patch which includes only updated fields. So a user doesn't have any ability to set newMD.Spec.Replicas to nil value.

Maybe it is possible to modify this check and exclude machineDeployments created by topology controller from it?

elmiko · 2023-03-01T20:32:42Z

Maybe it is possible to modify this check and exclude machineDeployments created by topology controller from it?

that seems like one way we could fix this, i had also suggested the possibility of using some sort of "managed-by" annotation to let the cluster-api parts know when the replicas will be controlled by some other controller. i'm not sure which is the best path forward, but i think we will need to bring this up with the cluster-api community.

also, it might be worth making an issue on the cluster-api repo about that issue you've found. that way we can get more attention in the cluster-api community and hopefully get a discussion going there. wdyt?

MaxFedotov · 2023-03-02T11:26:07Z

Yep, that would be the best place, thank you! Created kubernetes-sigs/cluster-api#8217 :)

elmiko · 2023-05-17T11:38:43Z

@MaxFedotov given that this is a known behavior of the cluster-autoscaler (see My cluster is below minimum / above maximum number of nodes, but CA did not fix that! Why? ), what do you think about closing this issue in favor of kubernetes-sigs/cluster-api#8217 ?

elmiko · 2023-12-18T19:40:14Z

do we still need this issue given that we have a flag to control this behavior now?

see #5195

i think we should close this, but will wait for a response from @MaxFedotov

sbueringer · 2023-12-19T11:57:31Z

Interesting. Didn't know this flag existed. Also sounds reasonable to me to close this issue

(Also: This was merged a year ago? :D)

MaxFedotov · 2023-12-19T14:13:23Z

Wow, I missed this flag too :) Will give it a try in our staging env and close the issue

MaxFedotov added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2022

jbartosik added the area/cluster-autoscaler label Oct 25, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2023

okozachenko1203 mentioned this issue Nov 10, 2023

bug: Autoscaler doesn't scale up when the replica number is less than the min node number vexxhost/magnum-cluster-api#243

Closed

sbueringer mentioned this issue Dec 19, 2023

Managed Cluster Topology and cluster-autoscaler kubernetes-sigs/cluster-api#8217

Open

MaxFedotov closed this as completed Dec 19, 2023

MaxFedotov mentioned this issue Feb 20, 2024

cluster-api-autoscaler and enforce-node-group-min-size flag #6549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-autoscaler CAPI scale up when current number of nodes is less than minSize #5267

cluster-autoscaler CAPI scale up when current number of nodes is less than minSize #5267

MaxFedotov commented Oct 21, 2022 •

edited

Loading

MaxFedotov commented Oct 25, 2022

elmiko commented Oct 26, 2022 •

edited

Loading

MaxFedotov commented Oct 27, 2022

elmiko commented Oct 27, 2022

k8s-triage-robot commented Jan 25, 2023

elmiko commented Jan 25, 2023

MaxFedotov commented Mar 1, 2023 •

edited

Loading

elmiko commented Mar 1, 2023

MaxFedotov commented Mar 1, 2023

elmiko commented Mar 1, 2023

MaxFedotov commented Mar 2, 2023

elmiko commented May 17, 2023

elmiko commented Dec 18, 2023 •

edited

Loading

sbueringer commented Dec 19, 2023 •

edited

Loading

MaxFedotov commented Dec 19, 2023

cluster-autoscaler CAPI scale up when current number of nodes is less than minSize #5267

cluster-autoscaler CAPI scale up when current number of nodes is less than minSize #5267

Comments

MaxFedotov commented Oct 21, 2022 • edited Loading

MaxFedotov commented Oct 25, 2022

elmiko commented Oct 26, 2022 • edited Loading

MaxFedotov commented Oct 27, 2022

elmiko commented Oct 27, 2022

k8s-triage-robot commented Jan 25, 2023

elmiko commented Jan 25, 2023

MaxFedotov commented Mar 1, 2023 • edited Loading

elmiko commented Mar 1, 2023

MaxFedotov commented Mar 1, 2023

elmiko commented Mar 1, 2023

MaxFedotov commented Mar 2, 2023

elmiko commented May 17, 2023

elmiko commented Dec 18, 2023 • edited Loading

sbueringer commented Dec 19, 2023 • edited Loading

MaxFedotov commented Dec 19, 2023

MaxFedotov commented Oct 21, 2022 •

edited

Loading

elmiko commented Oct 26, 2022 •

edited

Loading

MaxFedotov commented Mar 1, 2023 •

edited

Loading

elmiko commented Dec 18, 2023 •

edited

Loading

sbueringer commented Dec 19, 2023 •

edited

Loading