Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-autoscaler CAPI scale up when current number of nodes is less than minSize #5267

Closed
MaxFedotov opened this issue Oct 21, 2022 · 15 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@MaxFedotov
Copy link
Contributor

MaxFedotov commented Oct 21, 2022

Which component are you using?: Cluster Autoscaler with Cluster API

What version of the component are you using?:

Component version: v1.25.0

What k8s version are you using (kubectl version)?: 1.23.7

kubectl version Output
$ kubectl version 
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.7", GitCommit:"42c05a547468804b2053ecf60a3bd15560362fc2", GitTreeState:"clean", BuildDate:"2022-05-24T12:30:55Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.7", GitCommit:"42c05a547468804b2053ecf60a3bd15560362fc2", GitTreeState:"clean", BuildDate:"2022-05-24T12:24:41Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?: clusterapi controlled cluster

What did you expect to happen?:
We are using cluster-autoscaler with Cluster API ClusterClass and Managed Topologies feature. According to documentation, the user needs to omit replicas field in spec.topology.workers.machineDeployments[] in order to avoid conflict between cluster-autoscaler and topology controller. When replicas field is empty by default a single node is provisioned for each MachineDeployment.

Our intention was to use cluster-autoscaler to set a default minimum number of replicas for MachineDeployments, so we added

  annotations:
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "3"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "3"

annotations and were expecting to see 2 additional nodes to be provisioned.

What happened instead?:

According to this code

// Ensure the node group would have the capacity to scale
if scalableResource.MaxSize()-scalableResource.MinSize() < 1 {
return nil, nil
}

cluster-autoscaler for CAPI skips MachineDeployment where maxSize = minSize, even if the current number of replicas is less the specified minSize.

It seems a bit counterintuitive because the current size of the MachineDeployment is less than minSize specified by user, so there is definitely a capacity to scale till minSize is reached.

@MaxFedotov MaxFedotov added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2022
@MaxFedotov
Copy link
Contributor Author

@elmiko, can I please ask you to take a look, as one of the CAPI cluster-autoscaler maintainers

@elmiko
Copy link
Contributor

elmiko commented Oct 26, 2022

@MaxFedotov apologies for the delay, i am traveling at kubecon this week.

i have a couple thoughts about this:

  1. this is known behavior in the autoscaler, if your node group is below the minimum, or above the maximum, the autoscaler will not attempt to reach the minimum or maximum, respectively.
  2. there is currently some strange interactions between the ClusterClass and autoscaler with regards to how the replicas field should be set. in this scenario, the replicas in the MachineDeployment should be "3" but because ClusterClass needs to have the replicas empty to make the autoscaling portions work, this is problematic. cc @sbueringer knows a little more about this, and there is a related issue in cluster-api, see MachineDeployment.spec.replicas defaulting should take into account autoscaler min/max size if defined kubernetes-sigs/cluster-api#7293

in this specific case though, i think you could make the replicas 3 in your ClusterClass and it should work because you are using min/max of 3 as well. since the ClusterClass controller will want to keep the replicas at 3 and the autoscaler will never attempt to grow or shrink, it should be fine. but this only works because you've got this specific MachineDeployment locked at 3 nodes.

this is definitely an issue we need to solve with respect to the way ClusterClass works.

@MaxFedotov
Copy link
Contributor Author

@elmiko Thanks a lot for the explanations!

in this specific case though, i think you could make the replicas 3 in your ClusterClass and it should work because you are using min/max of 3 as well. since the ClusterClass controller will want to keep the replicas at 3 and the autoscaler will never attempt to grow or shrink, it should be fine.

The thing is that in this case I won't be able in future to use autoscaler with this nodegroup :(

My initial intention was to offload all nodegroup size management to cluster-autoscaler, but seems like it is impossible for now, because I need to perform the initial nodegroup scale-up, so it reaches a specific start size.

@elmiko
Copy link
Contributor

elmiko commented Oct 27, 2022

The thing is that in this case I won't be able in future to use autoscaler with this nodegroup :(

yeah, true =(

My initial intention was to offload all nodegroup size management to cluster-autoscaler, but seems like it is impossible for now, because I need to perform the initial nodegroup scale-up, so it reaches a specific start size.

it sounds like it won't work for you case currently, imo we need solve the issue in the cluster-api community around how this works with ClusterClasses.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2023
@elmiko
Copy link
Contributor

elmiko commented Jan 25, 2023

i think there is some interesting discussion happening related to this in kubernetes-sigs/cluster-api#7293 (comment)

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2023
@MaxFedotov
Copy link
Contributor Author

MaxFedotov commented Mar 1, 2023

Hi @elmiko! Want to go back to this issue after kubernetes-sigs/cluster-api#7990 was merged.
If I understand correctly, now it is possible to add a new machineDeployment to managed Cluster and specify cluster-autoscaler annotations in spec like:

    workers:
      machineDeployments:
      - class: default-worker
        metadata:
          annotations:
            cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "4"
            cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "2"
        name: md-1

and defaulting webhook will set replicas field of created machineDeployment to a value specified in cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size annotation.

But if I will update this field, for example, set cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "3" in Cluster spec, defaulting webhook will not update replicas field, due to this check

https://github.com/kubernetes-sigs/cluster-api/blob/6fede8c8cbc83e108aa6a5bc4507ed3aae096105/api/v1beta1/machinedeployment_webhook.go#L307-L311

So for now, if I want to increase a min (or in this case it will be current) number of nodes in an already created nodegroup, I had to update cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size annotation and also remove replicas field for a corresponding machineDeployment.

Is there any way (or maybe some plans) to allow users to control min number of nodes in a nodegroup with only cluster-autoscaler annotations?

Thanks!

@elmiko
Copy link
Contributor

elmiko commented Mar 1, 2023

So for now, if I want to increase a min (or in this case it will be current) number of nodes in an already created nodegroup, I had to update cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size annotation and also remove replicas field for a corresponding machineDeployment.

yeah, i can see how this would be frustrating and possibly brittle. imo, it seems like we might want to propose logic on the cluster-api controllers that could automatically update the replicas if it sees the minimum annotation increased. i'm not sure how the community would feel about that approach, but it's the first thing that came to mind. i have a feeling that reaching across the api boundaries between autoscaler and cluster-api might be confusing if we couple too tightly.

Is there any way (or maybe some plans) to allow users to control min number of nodes in a nodegroup with only cluster-autoscaler annotations?

i haven't heard anything like that, but sounds like an interesting idea.

@MaxFedotov
Copy link
Contributor Author

The interesting thing is that this check

func calculateMachineDeploymentReplicas(ctx context.Context, oldMD *MachineDeployment, newMD *MachineDeployment, dryRun bool) (int32, error) {
	// If replicas is already set => Keep the current value.
	if newMD.Spec.Replicas != nil {
		return *newMD.Spec.Replicas, nil
	}

completely prevents an ability to use cluster-autoscaler annotations when using managed Cluster topologies and machineDeployment is already created. Because if machineDeployments are managed by topology controller and workers .machineDeployments .[].metadata .annotations are updated it generates a Patch which includes only updated fields. So a user doesn't have any ability to set newMD.Spec.Replicas to nil value.

Maybe it is possible to modify this check and exclude machineDeployments created by topology controller from it?

@elmiko
Copy link
Contributor

elmiko commented Mar 1, 2023

Maybe it is possible to modify this check and exclude machineDeployments created by topology controller from it?

that seems like one way we could fix this, i had also suggested the possibility of using some sort of "managed-by" annotation to let the cluster-api parts know when the replicas will be controlled by some other controller. i'm not sure which is the best path forward, but i think we will need to bring this up with the cluster-api community.

also, it might be worth making an issue on the cluster-api repo about that issue you've found. that way we can get more attention in the cluster-api community and hopefully get a discussion going there. wdyt?

@MaxFedotov
Copy link
Contributor Author

Yep, that would be the best place, thank you! Created kubernetes-sigs/cluster-api#8217 :)

@elmiko
Copy link
Contributor

elmiko commented May 17, 2023

@MaxFedotov given that this is a known behavior of the cluster-autoscaler (see My cluster is below minimum / above maximum number of nodes, but CA did not fix that! Why? ), what do you think about closing this issue in favor of kubernetes-sigs/cluster-api#8217 ?

@elmiko
Copy link
Contributor

elmiko commented Dec 18, 2023

do we still need this issue given that we have a flag to control this behavior now?

see #5195

i think we should close this, but will wait for a response from @MaxFedotov

@sbueringer
Copy link
Member

sbueringer commented Dec 19, 2023

Interesting. Didn't know this flag existed. Also sounds reasonable to me to close this issue

(Also: This was merged a year ago? :D)

@MaxFedotov
Copy link
Contributor Author

Wow, I missed this flag too :) Will give it a try in our staging env and close the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

6 participants