Skip to content

Commit

Permalink
nodeinit: Azure IPAM - remove azure-vnet.json, flush ebtables, ip nei…
Browse files Browse the repository at this point in the history
…ghbors

This aims to address cilium#14233 ("Pods scheduled before Cilium don't get state
cleaned up") until a better solution can be implemented for AKS, ideally by
gaining support for deploying Cilium directly by Azure.

Currently, when Cilium is deployed with Azure IPAM enabled, all workload Pod
connectivity is interrupted until the Pod is re-scheduled.

Signed-off-by: Timo Beckers <[email protected]>
  • Loading branch information
ti-mo authored and gandro committed Jan 11, 2021
1 parent 868eb5d commit 9ae10c5
Showing 1 changed file with 37 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,9 @@ spec:
- name: CHECKPOINT_PATH
value: /tmp/node-init.cilium.io
# STARTUP_SCRIPT is the script run on node bootstrap. Node
# bootstrapping can be customized in this script.
# bootstrapping can be customized in this script. This script is invoked
# using nsenter, so it runs in the host's network and mount namespace using
# the host's userland tools!
- name: STARTUP_SCRIPT
value: |
#!/bin/bash
Expand Down Expand Up @@ -222,6 +224,40 @@ spec:
fi
{{- end }}

# AKS: If azure-vnet is installed on the node, and (still) configured in bridge mode,
# configure it as 'transparent' to be consistent with Cilium's CNI chaining config.
# If the azure-vnet CNI config is not removed, kubelet will execute CNI CHECK commands
# against it every 5 seconds and write 'bridge' to its state file, causing inconsistent
# behaviour when Pods are removed.
if [ -f /etc/cni/net.d/10-azure.conflist ]; then

echo "azure-vnet configured in bridge mode. Changing to 'transparent'..."
sed -i 's/"mode":\s*"bridge"/"mode":"transparent"/g' /etc/cni/net.d/10-azure.conflist

{{- if .Values.azure.enabled }}
# In Azure IPAM mode, also remove the azure-vnet state file, otherwise ebtables rules get
# restored by the azure-vnet CNI plugin on every CNI CHECK, which can cause connectivity
# issues in Cilium-managed Pods. Since azure-vnet is no longer called on scheduling events,
# this file can be removed.
rm -f /var/run/azure-vnet.json

# This breaks connectivity for existing workload Pods when Cilium is scheduled, but we need
# to flush these to prevent Cilium-managed Pod IPs conflicting with Pod IPs previously allocated
# by azure-vnet. These ebtables DNAT rules contain fixed MACs that are no longer bound on the node,
# causing packets for these Pods to be redirected back out to the gateway, where they are dropped.
echo 'Flushing ebtables pre/postrouting rules in nat table.. (disconnecting non-Cilium Pods!)'
ebtables -t nat -F PREROUTING || true
ebtables -t nat -F POSTROUTING || true

# ip-masq-agent periodically injects PERM neigh entries towards the gateway
# for all other k8s nodes in the cluster. These are safe to flush, as ARP can
# resolve these nodes as usual. PERM entries will be automatically restored later.
echo 'Deleting all permanent neighbour entries on azure0...'
ip neigh show dev azure0 nud permanent | cut -d' ' -f1 | xargs -r -n1 ip neigh del dev azure0 to
{{- end }}

fi

{{- if .Values.nodeinit.revertReconfigureKubelet }}
rm -f /tmp/node-deinit.cilium.io
{{- end }}
Expand Down

0 comments on commit 9ae10c5

Please sign in to comment.