Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to update VMSS resource, and preserve the current instance count? (in ARM template) #70

Open
johnib opened this issue May 25, 2020 · 8 comments

Comments

@johnib
Copy link

johnib commented May 25, 2020

Hello,

The context is ongoing VMSS provisioning/updating using ARM templates.
We use ARM template to manage all our ARM resources, in an Infra-as-Code manner (IoC).

When running in IoC, there are frequent ARM deployments, with incremental changes to our resource definitions.
In practice, I might have several ARM deployments a day, that usually do not change anything in the VMSS definition, but change other resources.

Additional information, is that on top of my VMSS clusters, I have configured auto-scale rules, that add/remove instances on an ongoing basis.

The scenario below, becomes problematic:

  1. VMSS is provisioned with 5 instances by default, i.e. in the ARM template the sku.capacity is 5 by default.
  2. After several days, this VMSS gets bigger due to auto-scale engine adding more and more instances, as the VMSS are under some load.
  3. The next release that initiates an ARM template deployment, changes back the instance count to 5, as the sku.capacity is not being updated in the code every time the auto-scale changes the instance count (of course.. right? :) )
  4. I have added a logic in my ARM template that after the initial provisioning of a specific VMSS cluster, to set sku.capacity to null instead of the default value 5. However, is not accepted by ARM due to:

Required parameter 'sku.capacity' is missing (null).

I'm looking for a way to indicate VMSS RP, to ignore the sku.capacity and preserve the existing instance count, without knowing it beforehand

Please advise, thank you!

@ktmaul
Copy link

ktmaul commented Jul 2, 2020

I would also like an answer to this question. I am deploying to a scale set which is managed by an external tool and I need to be able to update the base image without removing machines in use.

@george-moussa
Copy link

Any updates here?

@johnib
Copy link
Author

johnib commented Jul 29, 2020

Solution (workaround)

  1. While Azure does not accept null sku.capacity it does accept null sku object, which leaves the properties.sku definition untouched.

  2. How do you end up with a single ARM template, that initializes the VMSS cluster with 5 nodes if the cluster does not exist, and keeps the instance count as is, if it does exist? continue reading..

    1. Unfortunately, using reference function, on an unexisting resource fails the deployment, so there's no way of determining if a resource exists or not. Until there is, we need to do something clever if we really want to achieve the above behavior.

    2. Most of the below is just a mind-melting assembly of ingedients. The ingredients (the resource group tags, the union trick, the helper function below) belong to Bennie, Thank you Benjamin.

  3. The idea is to keep a state, which holds the VMSS cluster names that were provisioned previously.

    1. Where is the state kept? as tags on the resource group being deployed.

    2. How does the state look like? JSON object, where the keys hold the existing VMSS cluster name(s):

      {
        "vmss01": "<dont care>",
        "vmss02": "<dont care>"
      }
  4. Assume we have the main template, which accepts a list of objects, each describing a VMSS that needs to be created. The template logic is to loop through the list, and create (or update) each of the VMSS clusters. Then, the template would have additional step to update the resource group tags with the names of all VMSS clusters in that list. Here's a simplified version of the ARM parameter file:

    {
      "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
      "contentVersion": "1.0.0.0",
      "parameters": {
        "vmssClusters": {
          "value": [
            {
              "vmssName": "vmss01",
              "nodeCount": 5,
              "sku": "Standard_D2_v3",
              "autoScaleSettings": {
                "minInstanceCount": 5,
                "maxInstanceCount": 100,
                "defaultInstanceCount": 10
              }
            },
            {
              "vmssName": "vmss02",
              "nodeCount": 5,
              "sku": "Standard_D4_v3",
              "autoScaleSettings": {
                "minInstanceCount": 5,
                "maxInstanceCount": 100,
                "defaultInstanceCount": 10
              }
            }
          ]
        }
      }
    }
  5. Let's also assume the main template uses a nested (inner) template that holds the actual provisioning logic of a single VMSS cluster. The nested template accepts as input capacity parameter that describes the requested node count for the VMSS. If capacity = -1, then the nested template keeps the properties.sku null, and otherwise it assigns it a value, including assigning sku.capacity = vmssClusters[].nodeCount. The important part is how to make the main template provide vmssClusters[].nodeCount as input to the nested template, on the first time this specific VMSS cluster is created, and on subsequent executions provide -1.

    1. The idea is by keeping a state, which holds the VMSS cluster names that were previously created successfully.

    2. Where is the state kept? as tags on the resource group being deployed.

    3. How does the state look like? JSON object, where the keys hold the existing VMSS cluster name(s):

      {
        "vmss01": "<dont care>",
        "vmss02": "<dont care>"
      }
    4. Here's an initial version of the main template, it covers the -1 logic given there are resource group tags:

      {
        "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
        "contentVersion": "1.0.0.0",
        "parameters": {
          "vmssClusters": {
            "type": "array"
          }
        },
        "variables": {
          // if the resource group has no tags at all, then `resourceGroup().tags` would fail the deployment. This `union` solves this case.
          "resourceGroupTags": "[union(json('{\"tags\":{}}'), resourceGroup()).tags]" 
        },
        "resources": [
          {
            "copy": {
              "name": "VmssLoop",
              "count": "[length(parameters('vmssClusters'))]"
            },
            "apiVersion": "2019-10-01",
            "name": "[concat('vmss-', parameters('vmssClusters')[copyIndex()].vmssName)]",
            "type": "Microsoft.Resources/deployments",
            "properties": {
              "mode": "Incremental",
              "parameters": {
                "capacity": {
                  "value": "[if(contains(variables('resourceGroupTags'), parameters('vmssClusters')[copyIndex()].vmssName), -1, parameters('vmssClusters')[copyIndex()].nodeCount)]"
                }
              },
              "templateLink": {
                "uri": "/VMSS/azuredeploy.json",
                "contentVersion": "1.0.0.0"
              }
            }
          }
        ],
        "outputs": {}
      }
    5. Given the VmssLoop succeeds, how do we store the state? we add the following step that adds the resource group tags:

      {
        "dependsOn": [
          "VmssLoop"
        ],
        "type": "Microsoft.Resources/tags",
        "name": "default",
        "apiVersion": "2019-10-01",
        "properties": {
          "tags": "[variables('newResourceGroupTagsIfDeploymentSucceeds')]"
        }
      }
    6. Only thing left for us, is to define the variable newResourceGroupTagsIfDeploymentSucceeds. Which should hold the following value:

      {
        "vmss01": "<dont care>",
        "vmss02": "<dont care>"
      }
      1. This by itself requires some sophstication, as we need to generate this object dynamically, no matter how big is the vmssClusters list.

      2. Of course, we would need to use Copy, however Copy returns an array, not an object.

      3. We came up with some helper function that converts an array of objects (returned by Copy), to an object of key-value:

          "variables": {
            "copy": [
              {
                // returns: [{"vmss01": "_"},{"vmss02": "_"}]  ,
                "name": "defaults",
                "count": "[length(parameters('vmssClusters'))]",
                "input": {
                  "[parameters('vmssClusters')[copyIndex('defaults')].vmssName]": "_"
                }
              }
            ],
            "defaultTags": {
              "tags": "[helper.convertArrayToObject(variables('defaults'))]" // returns: {"vmss01": "_", "vmss02": "_"}
            },
            "resourceGroupTags": "[union(json('{\"tags\":{}}'), resourceGroup()).tags]",
            "newResourceGroupTagsIfDeploymentSucceeds": "[union(variables('defaultTags'), resourceGroup()).tags]"
          }
      4. How does the implementation of helper.convertArrayToObject look like?

          "functions": [
            {
              "namespace": "helper",
              "members": {
                "convertArrayToObject": {
                  "parameters": [
                    {
                      "name": "inArray",
                      "type": "array"
                    }
                  ],
                  "output": {
                    "type": "object",
                    "value": "[json(concat('{', replace(replace(replace(replace(string(parameters('inArray')), '[', ''), ']',''), '{', ''), '}', ''), '}'))]"
                  }
                }
              }
            }
          ],
    7. The final version of the main template:

      {
        "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
        "contentVersion": "1.0.0.0",
        "parameters": {
          "vmssClusters": {
            "type": "array"
          }
        },
        "functions": [
          {
            "namespace": "helper",
            "members": {
              "convertArrayToObject": {
                "parameters": [
                  {
                    "name": "inArray",
                    "type": "array"
                  }
                ],
                "output": {
                  "type": "object",
                  "value": "[json(concat('{', replace(replace(replace(replace(string(parameters('inArray')), '[', ''), ']',''), '{', ''), '}', ''), '}'))]"
                }
              }
            }
          }
        ],
        "variables": {
          "copy": [
            {
              "name": "defaults",
              "count": "[length(parameters('vmssClusters'))]",
              "input": {
                "[parameters('vmssClusters')[copyIndex('defaults')].vmssName]": "_"
              }
            }
          ],
          "defaultTags": {
            "tags": "[helper.convertArrayToObject(variables('defaults'))]"
          },
          "resourceGroupTags": "[union(json('{\"tags\":{}}'), resourceGroup()).tags]",
          "newResourceGroupTagsIfDeploymentSucceeds": "[union(variables('defaultTags'), resourceGroup()).tags]"
        },
        "resources": [
          {
            "copy": {
              "name": "VmssLoop",
              "count": "[length(parameters('vmssClusters'))]"
            },
            "apiVersion": "2019-10-01",
            "name": "[concat('vmss-', parameters('vmssClusters')[copyIndex()].vmssName)]",
            "type": "Microsoft.Resources/deployments",
            "properties": {
              "mode": "Incremental",
              "parameters": {
                "capacity": {
                  "value": "[if(contains(variables('resourceGroupTags'), parameters('vmssClusters')[copyIndex()].vmssName), -1, parameters('vmssClusters')[copyIndex()].nodeCount)]"
                }
              },
              "templateLink": {
                "uri": "/VMSS/azuredeploy.json",
                "contentVersion": "1.0.0.0"
              }
            }
          },
          {
            "dependsOn": [
              "VmssLoop"
            ],
            "type": "Microsoft.Resources/tags",
            "name": "default",
            "apiVersion": "2019-10-01",
            "properties": {
              "tags": "[variables('newResourceGroupTagsIfDeploymentSucceeds')]"
            }
          }
        ],
        "outputs": {}
      }

Notes

  1. Don't keep critical information in the state's dictionary values, use only the dictionary keys. The reason for that is that union function logic does not specify how it handles items that are duplicated across both objects (which value is the one to be used in the unioned output).

  2. The VMSS properties.sku also includes the VM SKU, which the solution above does not support changing.

  3. As a heavy ARM templates user, I'd be happy to throw away the above dirty workaround in favor of native support from ARM templates engine for checking if a resource exists or not. With the amount of VMSS clusters I maintain, I have no choice but to use this workaround in the meantime.

@sachip-msft
Copy link

sachip-msft commented Jul 21, 2021

The workaround does not work. sku object does not accept -1 value for capacity.
Error below on existing SF cluster when value is set to -1 in subsequent deployment. However you can achieve this behaviour by setting properties.sku to null json('null') to skip changing the instance counts for particular set.
This is tested on version 2021-07-01
Status Message: Error converting value -1 to type 'System.Nullable`1[System.UInt32]'. Path 'sku.capacity', line 1,
position 61. (Code:BadRequest)

@johnib
Copy link
Author

johnib commented Jul 22, 2021

I assure you this is what we run in production (~500 VMSS clusters) ever since this post was published.

You're invited to publish your template here so we can figure it out.

The -1 is not assigned as the final value, it is used as a hint for the inner template to decide on the SKU object (you assign null to the SKU object as a whole)

@xinyi-joffre
Copy link

We are also using workaround now, but would be super helpful if VMSS team could support natively not setting capacity (especially if autoscale is also there). One possibly implementation is to support an initialCapacity parameter that is required for new vmss, but will be ignored for existing clusters which may have autoscale or been manually set for capacity.

@xinyi-joffre
Copy link

@johnib, when we tried the solution above to remove sku property, we noticed that updated scripts in custom script extension + new environment variables doesn't run on new VMs or re-imaged VMs.

Do you know if that's a known side effect of the solution above?

@johnib
Copy link
Author

johnib commented Sep 3, 2022

Show me where you configure the environment variables and scripts, it doesn't make sense to me, although it's not part of my use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants