Skip to content

Commit

Permalink
add scaler migration to subnet scarcity phase 2 (#1655)
Browse files Browse the repository at this point in the history
* add scaler migration to phase 2

Signed-off-by: Evan Baker <[email protected]>

* clarify ceiling function notation

Signed-off-by: Evan Baker <[email protected]>

Signed-off-by: Evan Baker <[email protected]>
  • Loading branch information
rbtr authored Oct 17, 2022
1 parent 96c113a commit c92e29b
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 0 deletions.
13 changes: 13 additions & 0 deletions docs/feature/subnet-scarcity/phase-2/2-scalingmath.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ $$
Request = B \times \lceil mf + \frac{U}{B} \rceil
$$

> Note: $\lceil ... \rceil$ is the ceiling function.
where $U$ is the number of Assigned (Used) IPs on the Node, $B$ is the Batch size, and $mf$ is the Minimum Free Fraction, as discussed in the [Background](../proposal.md#background).

The "Required" IP Count is forward looking without effecting the correctness of the Request: it represents the target quantity of IP addresses that CNS *will Assign to Pods* at some instant in time. This may include Pods scheduled which do not *currently* have Assigned IPs because there are insufficient Available IPs in the Pool.
Expand All @@ -74,3 +76,14 @@ $$
As shown, if the demand is for $25$ IPs, and the Batch is $16$, and the Min Free is $8$ (half of the Batch), then the Request must be $48$. $32$ is too few, as $32-25=7 < 8$.

This algorithm will significantly improve the time-to-pod-ready for large changes in the quantity of scheduled Pods on a Node, due to eliminating all iterations required for CNS to converge on the final Requested IP Count.


### Including PrimaryIPs

The IPAM Pool scaling operates only on NC SecondaryIPs. However, CNS is allocated an additional `PrimaryIP` for every NC as a prerequisite of that NC's existence. Therefore, to align the **real allocated** IP Count to the Batch size, CNS should deduct those PrimaryIPs from its Requested (Secondary) IP Count.

This makes the RequestedIPCount:

$$
RequestedIPCount = B \times \lceil mf + \frac{U}{B} \rceil - PrimaryIPCount
$$
42 changes: 42 additions & 0 deletions docs/feature/subnet-scarcity/phase-2/3-subnetscaler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## Migrating the Scaler properties to the ClusterSubnet CRD [[Phase 3 Design]](../proposal.md#2-3-scaler-properties-move-to-the-clustersubnet-crd)
Currently, the [`v1alpha/NodeNetworkConfig` contains the Scaler inputs](https://github.com/Azure/azure-container-networking/blob/eae2389f888468e3b863cb28045ba613a5562360/crd/nodenetworkconfig/api/v1alpha/nodenetworkconfig.go#L66-L72) which CNS will use to scale the local IPAM pool:

```yaml
...
status:
scaler:
batchSize: X
releaseThresholdPercent: X
requestThresholdPercent: X
maxIPCount: X
```
Since the Scaler values are dependent on the state of the Subnet, the Scaler object will be moved to the ClusterSubnet CRD and optimized.
### ClusterSubnet Scaler
The ClusterSubnet `Status.Scaler` definition will be:
```yaml
...
status:
scaler:
batch: X // equal to batchSize
buffer: X // equal to requestThresholdPercent
```

Additionally, the `Spec` of the ClusterSubnet will accept `Scaler` values to be used as runtime overrides. DNC-RC will read and validate the `Spec`, then write the values back out to the `Status` if present.
```yaml
...
spec:
scaler:
<...>
```



Note:
- The `scaler.maxIPCount` will not be migrated, as the maxIPCount is a property of the Node and not the Subnet.
- The `scaler.releaseThresholdPercent` will not be migrated, as it is redundant. The `buffer` (and in fact the `requestThresholdPercent`), imply a `releaseThresholdPercent` and one does not need to be specified explicitly. The [IPAM Scaling Math](../phase-2/2-scalingmath.md) incorporates only a single threshold value and fully describes the behavior of the system.

#### Migration
When the Scaler is added to the ClusterSubnet CRD definiton, DNC-RC will begin replicating the `batch` and `buffer` properties from the NodeNetworkConfig, keeping both up to date.

CNS, which already watches the ClusterSubnet CRD for known Subnets, will use the Scaler properties from that object as a priority, and will fall back to using the NNC Scaler properties if they are not present in the ClusterSubnet.
5 changes: 5 additions & 0 deletions docs/feature/subnet-scarcity/proposal.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ $$

where $U$ is the number of Assigned (Used) IPs on the Node.

CNS will include the NC Primary IP(s) as IPs that it has been allocated, and will subtract them from its real Requested IP Count such that the _total_ number of IPs allocated to CNS is a multiple of the Batch.

#### [[2-3]](phase-2/3-subnetscaler.md) Scaler properties move to the ClusterSubnet CRD
The Scaler properties from the v1alpha/NodeNetworkConfig `Status.Scaler` definition are moved to the ClusterSubnet CRD, and CNS will use the Scaler from this CRD as priority when it is available, and fall back to the NNC Scaler otherwise. The `.Spec` field of the CRD may serve as an "overrides" location for runtime reconfiguration.

### Phase 3
#### [[3-1]](phase-3/1-watchpods.md) CNS watches Pods

Expand Down

0 comments on commit c92e29b

Please sign in to comment.