Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(bug): Ensure windows agent stability using hubble/legacy helm val…
…ues (#1128) # Description This PR aims to fix the stability of the retina windows agent. There were 4 causes identified and each commit resolves one respectively. 1. Invalid rendering of the namespace helm value (1st commit) ``` matmerr@matmerr-cloud-dev: ~/go/src/github.com/Azure/telescope [06:56:29 PM][matmerr-aks-pktmon-11][matmerr/enable-ama]$ k logs -f retina-agent-win-7f7kb Starting Retina Agent starting Retina daemon with legacy control plane v0.0.17 2024/12/02 18:56:22 metricsInterval is deprecated, please use metricsIntervalDuration instead init client-go KUBECONFIG set, using kubeconfig: C:\hpc\kubeconfig Error: starting daemon: creating controller-runtime manager: error loading config file "C:\hpc\kubeconfig": yaml: invalid map key: map[interface {}]interface {}{".Values.namespace":interface {}(nil)} ``` 2. Default operator value is enabled and will cause RBAC issues for the windows agents (2nd commit) ``` ts=2024-12-10T16:58:48.634Z level=info caller=hnsstats/hnsstats_windows.go:212 msg="Start hnsstats plugin..." W1210 16:58:49.990792 7108 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "metricsconfigurations" in API group "retina.sh" at the cluster scope ``` 3. Telemetry enabled also causes the agent to panic if application insights is not defined. User can change the config map as desired but default values should not cause the agent to crash (3rd commit) 4. `kubeconfig` file cannot be found for the legacy chart values. Executing the `setkubeconfigpath.ps1` was required for the container setup (4th commit). Update: It was later found that the missing `kubeconfig` error only exists on redeploy if the initial retina was before this change (#1118). A later GH issue was created - #1138 ``` beegii@bignamboi:~/src/retina$ k logs retina-agent-win-4tl7m -n kube-system Starting Retina Agent starting Retina daemon with legacy control plane v0.0.17 2024/12/11 18:40:15 metricsInterval is deprecated, please use metricsIntervalDuration instead init client-go KUBECONFIG set, using kubeconfig: C:\hpc\kubeconfig Error: starting daemon: creating controller-runtime manager: CreateFile C:\hpc\kubeconfig: The system cannot find the file specified. ``` ## Related Issue #1122 ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/contributing). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [x] I have updated the documentation, if necessary. - [x] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed Each commit corresponding image was built and tested on the cluster to confirm each fix works! ![image](https://github.com/user-attachments/assets/dde7fe23-22ff-49bf-8c96-2c1a42c96f9d) ## Additional Notes First three problems were experienced when deploying retina using the hubble path and the last issue was experienced when deploying retina using the legacy path --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project.
- Loading branch information