-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to upgrade AKS cluster using terraform module - Minor version of node pool version 27 is bigger than control plane version #465
Comments
We at @swisspost have the exact same issue. I analyzed it with some of my colleagues (@hichem-belhocine will ping you @zioproto via MSteams). We have no auto update mechanism from AKS side in place (Auto Upgrade Type = Disabled), we specify the exact patch version for both:
With TF module version 6.8.0 and the upgrade from 1.25 -> 1.26 a plan looked like this: Everything went smooth. Then we upgraded to module version 7.5.0 and the upgrade is now triggered via an other TF provider (azapi) and the control plane variable is ignored via a A plan to 1.27 now looks like this (w/ module version 7.5.0): And of course this will fail as you cannot update the nodepool to a newer version than the control-plane:
|
@dunefro I understand you are using I see the problem where after PR #336 the cluster update is handled like this: Lines 603 to 616 in 3851478
so the Terraform plan will attempt to update just the nodepools, because Lines 520 to 526 in 3851478
@dunefro to unblock your work you could set the following values:
This will update the control plane first to the latest 1.27 patch version. Once Terraform apply completes you can update your values to:
This will finish the update updating also the node pools. @dunefro please confirm if this unblocks your work and you can upgrade successfully. @lonegunmanb @nellyk for the long term solution I see 2 options:
Please everyone give feedback on what is the best option for you. |
TODO: test if the following code: Lines 603 to 616 in 3851478
will upgrade also the System node pool to |
I am not a fan of the approach to upgrade everything with the Azapi provider if we do hard pinning and disable autopatch. The approach of azapi has several caveats, the main one is the non-idempotent behavior. I was happy with the available approach of 6.8. Maybe we need two things:
|
The challenge is that we cannot have a conditional We would have to create 2 independent |
Next step:
|
@mkilchhofer I am not able to reproduce the non-idempotent behavior. I created PR #501 where I extend one of our current CI tests to run the upgrades. As I soon as I change the value of |
Can do some more testing but I am now one week on vacation (⛷️🎿🏔️) . I am back in the office on February 5th. |
Yes this works. Thanks @zioproto |
Hello zioproto. I think it makes sense from a user perspective with both WoW's. Since typically you do upgrade first the control plane before the workers. For a usability PoV I prefer having to being able to set both versions to the version I want and the module to handle upgrade order. As it once worked in 6.8.0. However today, there is no documentation over the modules dropped support for handling a seamless upgrade of both orchestrator and worker planes, causing this confusion. I think it makes sense to be able to upgrade some node pools to a newer version, while some maybe can stay behind a version. For us who are using the module, just the added documentation and the intended way to upgrade after the 6.8 to 7.5 upgrade is enough for us to be satisfied. Edit: For us who use PR's to to peer review each others changes, this means we need to do two changes.. one for the controller plane, the second for, node pools... which is a bit more of a hassle. |
@mkilchhofer friendly ping about this pending issue :) thanks |
@mkilchhofer friendly ping about this issue :) |
I cannot test it anymore. We no longer use the module and manage the required TF resources ourself. /cc: @zioproto |
Can anyone else test it? |
I was testing out the entire setup once again with the current version being Somehow I am not able to detect any drift for upgrading the kubernetes version. If I try to use I have also tried to upgrade to AKS module version One more weird things i have observed in the drift is the node count of the Azure AKS node pool. If I am not specifying the node count, rather specifying the It would be great if we can get some help in the official documentation on -
they should cover various scenarios where node pools are managed by cluster_autoscaler or not. |
Would this example we already test in the CI help to demostrate how to use node pools ? https://github.com/Azure/terraform-azurerm-aks/tree/main/examples/multiple_node_pools |
@lonegunmanb could you please double check this with me ? in the resource terraform-azurerm-aks/extra_node_pool.tf Lines 6 to 28 in 2c364a6
we mark terraform-azurerm-aks/variables.tf Lines 944 to 949 in 2c364a6
But in the example we have terraform-azurerm-aks/examples/multiple_node_pools/main.tf Lines 34 to 44 in 2c364a6
When cluster autoscaler is enabled For the system node pool the variable is called
@dunefro what happens if you set to |
@dunefro To try to reproduce your issue I tested the example in this repo
But I dont have Terraform state drift , and I can apply "terraform apply" over and over again without detecting any change. Could you please provide a code example that shows the problem of terraform trying to force the nodes to 0 ? Thanks |
@zioproto Yes this gets solved when I set the |
Is there an existing issue for this?
Terraform Version
1.6.2
Module Version
7.4.0
AzureRM Provider Version
3.69.0
Affected Resource(s)/Data Source(s)
module.aks.azurerm_kubernetes_cluster.main
Terraform Configuration Files
tfvars variables values
Debug Output/Panic Output
Expected Behaviour
Ideally the control plane should first gets updated and then the node pool. To resolve this issue I have to first update the control plane from the portal and then update the node pool from terraform.
Actual Behaviour
There is no terraform diff for updating control plane and node pool upgrade causes version incompatibility error.
Steps to Reproduce
No response
Important Factoids
No response
References
No response
The text was updated successfully, but these errors were encountered: