Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubelet instance configuration to configure CRI socket for each node #3042

Open
HirazawaUi opened this issue Apr 5, 2024 · 19 comments
Open
Labels
area/feature-gates area/kubelet kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@HirazawaUi
Copy link
Contributor

HirazawaUi commented Apr 5, 2024

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

Anything else we need to know?

This issue wants to remove container runtime interface (CRI) socket annotations from Node objects in Kubernetes and no longer added --container-runtime-endpoint args in kubeadm-flags.env ( --container-runtime-endpoint args deprecated in kubelet), this annotation and args are used to specify the CRI socket endpoint on each node that the kubelet uses to communicate with the container runtime.

Add kubelet-instance-config.yml locally to store container runtime interface (CRI) sockets, and in kubeadm init, join and upgrade, override the global kubelet configuration through kubelet-instance-config.yml.

More detailed design at: https://hackmd.io/@r2Yq-PVoRcK2OaGypnwPMg/rJFChACCp/edit

Tasks

Preview Give feedback
No tasks being tracked yet.

edit by neolit123:

1.32 alpha

1.33 alpha

  • k/k PRs:
    • TODO remove the --container-runtime flag from /var/lib/kubelet/kubeadm-flags.env
    • TODO remove the annotation if the FG is enabled on upgrade
@HirazawaUi
Copy link
Contributor Author

/cc @neolit123 @pacoxu

@HirazawaUi HirazawaUi changed the title FEATURE REQUEST: Add kubelet instance config FEATURE REQUEST: Add kubelet instance configuration to configure CRI socket for each node Apr 5, 2024
@neolit123
Copy link
Member

@HirazawaUi we need a KEP for this, ket's see what @pacoxu can comment about the previous work done on this.

@pacoxu can you please post links to your previous KEP work.?
also didn't we already have a tracking issue here in k/kubeadm?

maybe we can close KEP PRs and let @HirazawaUi to take over.

@neolit123 neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. kind/feature Categorizes issue or PR as related to a new feature. kind/design Categorizes issue or PR as related to design. area/kubelet labels Apr 5, 2024
@neolit123 neolit123 added this to the v1.31 milestone Apr 5, 2024
@neolit123
Copy link
Member

also are you willing to work on this @HirazawaUi
there are a lot of things to update around this change across multiple releases. and i don't think me or anyone else will have a lot of time for it.

  • KEP design how is it going to work exactly, do we need a feature gate
  • k8s.io docs update
  • k/k code update
  • do we need a e2e test in k/kubeadm

too many questions

@HirazawaUi
Copy link
Contributor Author

also are you willing to work on this @HirazawaUi there are a lot of things to update around this change across multiple releases. and i don't think me or anyone else will have a lot of time for it.

  • KEP design how is it going to work exactly, do we need a feature gate
  • k8s.io docs update
  • k/k code update
  • do we need a e2e test in k/kubeadm

too many questions

Yes, I'll try to finish it.

@HirazawaUi
Copy link
Contributor Author

@pacoxu What do you think about this? I would love to hear your comments :)

@pacoxu
Copy link
Member

pacoxu commented Apr 22, 2024

The KEP

My current proposal would be something like kubernetes/enhancements#3930 (comment)

i meant a generic patches approach where a local config file overrides the global config downloaded from the cm. not the --patches feature.

this local file must be stored somewhere. perhaps in the same dir as config.yaml, but called config-instance.yaml. IMO it has a number of tricky aspects that need to be covered in the design doc for init, join, upgrade.

Above is @neolit123 your proposal in comment here https://github.com/kubernetes/enhancements/pull/3930/files#r1177682829.

I prefer this solution.

BTW, /var/lib/kubelet/kubeadm-flags.env will only have --container-runtime-endpoint later. I would like to make this cri-socket-annotation and runtime endpoint configuration to config-instance.yaml under /var/lib/kubelet.

@HirazawaUi
Copy link
Contributor Author

Thanks, I have missed this comment, I will resurrect kep as soon as possible.

@neolit123 neolit123 changed the title FEATURE REQUEST: Add kubelet instance configuration to configure CRI socket for each node Add kubelet instance configuration to configure CRI socket for each node Jun 7, 2024
@neolit123
Copy link
Member

note if this ever becomes part of kubeadm, we can close this ticket:
#1924

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2024
@neolit123
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2024
@neolit123
Copy link
Member

@HirazawaUi i added some TODOs a tracking in the issue description here.
please keep it updated with links - ....

@HirazawaUi
Copy link
Contributor Author

@HirazawaUi i added some TODOs a tracking in the issue description here. please keep it updated with links - ....

OK, thank you so much for your help.

@neolit123 neolit123 modified the milestones: v1.32, v1.33 Nov 27, 2024
@neolit123
Copy link
Member

neolit123 commented Nov 28, 2024

@HirazawaUi @pacoxu

i think i found one problem wih the feature gate.
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L138

here we read the value from the env file, but we also need to migrate users away from using the flag --container-runtime for the kubelet. so on upgrade we need to mutate the env file to no longer have the --container-runtime flag. the problem here is that if the kubelet receives the flag --container-runtime then it will take precedence over the value passed from the kubelet config.yaml and the config yaml will be ignored.

this is actually something we might have to backport as a fix for 1.32. in the alpha.

we have this utility for writing the env file.
https://github.com/kubernetes/kubernetes/blob/95d71c464a6a5f81ae36bddc86972b3b2a1c40a5/cmd/kubeadm/app/phases/kubelet/flags.go#L65

as far as i can tell we are not doing this env file mutation on upgrade, but please correct me if i am wrong.

@pacoxu
Copy link
Member

pacoxu commented Nov 28, 2024

Probably, we can keep it as is in v1.32, as the value in env file and instance file are the same.

  • v1.33+: mutate the env file during upgrade
  • v1.34 make it beta?

@neolit123
Copy link
Member

i think i agree to not backport the fix for this alpha feature.
i guess the problem is that as long as --container-runtime is in the env file it means the feature gate doesn't work.
for new clusters however init/join would not have the --container-runtime flag in the env file i think @HirazawaUi to confirm.

@HirazawaUi
Copy link
Contributor Author

here we read the value from the env file, but we also need to migrate users away from using the flag --container-runtime for the kubelet. so on upgrade we need to mutate the env file to no longer have the --container-runtime flag. the problem here is that if the kubelet receives the flag --container-runtime then it will take precedence over the value passed from the kubelet config.yaml and the config yaml will be ignored.

Yes, that is indeed a problem.

for new clusters however init/join would not have the --container-runtime flag in the env file i think @HirazawaUi to confirm.

That's right.

Given that we have never mentioned this (modifying an existing env file) in the KEP, I agree with @pacoxu's view that we should keep it as is in 1.32 and move it to beta in 1.34, WDYT?

I will update the KEP later to add this description and revise it according to the implemented PR (I had planned to do this last week, but it was delayed due to being reinfected with COVID-19).

@neolit123
Copy link
Member

1.32 and move it to beta in 1.34, WDYT?

1.34 SGTM, but this issue is something we can fix in 1.33 as well and still keep it alpha.
we can then check in the e2e test the flag is correctly removed.

I will update the KEP later to add this description and revise it according to the implemented PR (I had planned to do this last week, but it was delayed due to being reinfected with COVID-19).

ok, take care of your self.

@neolit123
Copy link
Member

neolit123 commented Dec 6, 2024

1.32 and move it to beta in 1.34, WDYT?

1.34 SGTM, but this issue is something we can fix in 1.33 as well and still keep it alpha. we can then check in the e2e test the flag is correctly removed.

@HirazawaUi
something else that we are missing:

  • if the FG is enabled before upgrade the annotation must be removed from the Node objects of all nodes.
  • it should be a specific cleanup present in the post-upgrade area of kubeadm until we graduate the FG to GA.

kubernetes/kubernetes#128031
did not have it.

i added these 2 things as TODO in the OP here.

@HirazawaUi
Copy link
Contributor Author

i added these 2 things as TODO in the OP here.

All right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/feature-gates area/kubelet kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

5 participants