Skip to content

Commit

Permalink
fix(bug): Ensure windows agent stability using hubble/legacy helm val…
Browse files Browse the repository at this point in the history
…ues (#1128)

# Description

This PR aims to fix the stability of the retina windows agent. There
were 4 causes identified and each commit resolves one respectively.

1. Invalid rendering of the namespace helm value (1st commit)
```
matmerr@matmerr-cloud-dev: ~/go/src/github.com/Azure/telescope
[06:56:29 PM][matmerr-aks-pktmon-11][matmerr/enable-ama]$ k logs -f retina-agent-win-7f7kb
Starting Retina Agent
starting Retina daemon with legacy control plane v0.0.17
2024/12/02 18:56:22 metricsInterval is deprecated, please use metricsIntervalDuration instead
init client-go
KUBECONFIG set, using kubeconfig:  C:\hpc\kubeconfig
Error: starting daemon: creating controller-runtime manager: error loading config file "C:\hpc\kubeconfig": yaml: invalid map key: map[interface {}]interface {}{".Values.namespace":interface {}(nil)}
```

2. Default operator value is enabled and will cause RBAC issues for the
windows agents (2nd commit)

```
ts=2024-12-10T16:58:48.634Z level=info caller=hnsstats/hnsstats_windows.go:212 msg="Start hnsstats plugin..."
W1210 16:58:49.990792    7108 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "metricsconfigurations" in API group "retina.sh" at the cluster scope
```

3. Telemetry enabled also causes the agent to panic if application
insights is not defined. User can change the config map as desired but
default values should not cause the agent to crash (3rd commit)

4. `kubeconfig` file cannot be found for the legacy chart values.
Executing the `setkubeconfigpath.ps1` was required for the container
setup (4th commit).

Update:
It was later found that the missing `kubeconfig` error only exists on
redeploy if the initial retina was before this change
(#1118). A later GH issue was
created - #1138

```
beegii@bignamboi:~/src/retina$ k logs retina-agent-win-4tl7m -n kube-system
Starting Retina Agent
starting Retina daemon with legacy control plane v0.0.17
2024/12/11 18:40:15 metricsInterval is deprecated, please use metricsIntervalDuration instead
init client-go
KUBECONFIG set, using kubeconfig:  C:\hpc\kubeconfig
Error: starting daemon: creating controller-runtime manager: CreateFile C:\hpc\kubeconfig: The system cannot find the file specified.
```

## Related Issue

#1122

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Each commit corresponding image was built and tested on the cluster to
confirm each fix works!


![image](https://github.com/user-attachments/assets/dde7fe23-22ff-49bf-8c96-2c1a42c96f9d)

## Additional Notes

First three problems were experienced when deploying retina using the
hubble path and the last issue was experienced when deploying retina
using the legacy path

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
  • Loading branch information
BeegiiK authored Jan 3, 2025
1 parent 29331f0 commit cd219ca
Show file tree
Hide file tree
Showing 5 changed files with 12 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ data:
metricsInterval: {{ .Values.metricsInterval }}
metricsIntervalDuration: {{ .Values.metricsIntervalDuration }}
enableTelemetry: {{ .Values.enableTelemetry }}
enablePodLevel: {{ .Values.enablePodLevel }}
enablePodLevel: false
remoteContext: {{ .Values.remoteContext }}
bypassLookupIPOfInterest: {{ .Values.bypassLookupIPOfInterest }}
{{- end}}
Expand Down
2 changes: 1 addition & 1 deletion deploy/hubble/manifests/controller/helm/retina/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ logLevel: info
enabledPlugin_linux: '["linuxutil","packetforward","packetparser","dns", "dropreason"]'
enabledPlugin_win: '["hnsstats"]'

enableTelemetry: true
enableTelemetry: false

# Interval, in duration, to scrape/publish metrics.
metricsIntervalDuration: "10s"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,13 @@ spec:
containerPort: {{ .Values.retinaPort }}
workingDir: $env:CONTAINER_SANDBOX_MOUNT_POINT
command:
- controller.exe --config ./retina/config.yaml
- powershell.exe
- -command
{{- if semverCompare ">=1.28" .Capabilities.KubeVersion.GitVersion }}
- $env:CONTAINER_SANDBOX_MOUNT_POINT/controller.exe --config ./retina/config.yaml
{{- else }}
- .\setkubeconfigpath.ps1; ./controller.exe --config ./retina/config.yaml --kubeconfig ./kubeconfig
{{- end }}
env:
- name: POD_NAME
valueFrom:
Expand Down
2 changes: 1 addition & 1 deletion windows/kubeconfigtemplate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ contexts:
- name: azure-retina-windows@kubernetes
context:
cluster: kubernetes
namespace: {{ .Values.namespace }}
namespace: kube-system
user: azure-retina-windows
current-context: azure-retina-windows@kubernetes
users:
Expand Down
4 changes: 2 additions & 2 deletions windows/manifests/windows.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
labels:
app: retina
name: retina-win
namespace: {{ .Values.namespace }}
namespace: kube-system
annotations:
prometheus.io/port: "10093"
prometheus.io/scrape: "true"
Expand Down Expand Up @@ -62,7 +62,7 @@ apiVersion: v1
kind: ConfigMap
metadata:
name: retina-config-win
namespace: {{ .Values.namespace }}
namespace: kube-system
data:
config.yaml: |-
apiServer:
Expand Down

0 comments on commit cd219ca

Please sign in to comment.