-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update containerd to v2.0.2 #3828
Conversation
/retest |
I plan on doing some more local testing, but this looks good to me and CI is happy. /lgtm |
560c251
to
b5b1ff6
Compare
} | ||
var snapshotter string | ||
switch configVersion { | ||
case 2: // Introduced in containerd v1.3. Still supported in containerd v2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to allow user supplied config patches to similarly target 2 vs 3 separately, though that doesn't need to be done in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we depend on the version of the config?
kind/pkg/cluster/internal/create/actions/config/config.go
Lines 119 to 146 in 90d9439
if len(ctx.Config.ContainerdConfigPatches) > 0 || len(ctx.Config.ContainerdConfigPatchesJSON6902) > 0 { | |
fns := make([]func() error, len(kubeNodes)) | |
for i, node := range kubeNodes { | |
node := node // capture loop variable | |
fns[i] = func() error { | |
// read and patch the config | |
const containerdConfigPath = "/etc/containerd/config.toml" | |
var buff bytes.Buffer | |
if err := node.Command("cat", containerdConfigPath).SetStdout(&buff).Run(); err != nil { | |
return errors.Wrap(err, "failed to read containerd config from node") | |
} | |
patched, err := patch.TOML(buff.String(), ctx.Config.ContainerdConfigPatches, ctx.Config.ContainerdConfigPatchesJSON6902) | |
if err != nil { | |
return errors.Wrap(err, "failed to patch containerd config") | |
} | |
if err := nodeutils.WriteFile(node, containerdConfigPath, patched); err != nil { | |
return errors.Wrap(err, "failed to write patched containerd config") | |
} | |
// restart containerd now that we've re-configured it | |
// skip if containerd is not running | |
if err := node.Command("bash", "-c", `! pgrep --exact containerd || systemctl restart containerd`).Run(); err != nil { | |
return errors.Wrap(err, "failed to restart containerd after patching config") | |
} | |
return nil | |
} | |
} | |
if err := errors.UntilErrorConcurrent(fns); err != nil { | |
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we depend on the version of the config?
user's patches will not work as intended, they will wind up merging in keys that are not valid for v3 because all the keys have new namespacing again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and if they simply switch to v3 keys, it will not work with older kind images)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so you refer to the "existing" user patches, I think is a matter of how you see it then ... for me those are containerd specific patches, and yes, users need to adapt them to containerd 2.0 unless there is some containerd feature that allows to migrate between formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for me those are containerd specific patches, and yes, users need to adapt them to containerd 2.0 unless there is some containerd feature that allows to migrate between formats
But then their config won't work for older kind releases/images.
The way we do this with kubeadm config patches is you can have multiple patches that are targeting different kubeadm API versions. So far that's been irrelevant because for nearly all of our history we've only had containerd 2.0 config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually, that is for a future PR, because in this PR we're still using a v2.0 config (images/base/files/etc/containerd/config.toml
), this part of the diff seems to be technically unecessary for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I got it now, so we keep using containerd 2.0 version for the config and let containerd deal with the translation internall to version 3 https://github.com/containerd/containerd/blob/main/docs/cri/config.md#config-versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, before we switch our config to v3 we should give users a way to deal with it smoothly, but we're not actually switching to v3 in this PR, even though this PR adds partial support for v3.
/lgtm I'm positively surprised the diff is so small ... IIRC we already had issues with |
I don't think this addresses all of the issues, and since CRI only has v1 and not v1alpha2 in containerd v2, we break kubernetes < 1.26 (which is probably OK, but worth noting, we have not actively broken an old k8s version for some time now). |
CRI v1 seems introduced in Kubernetes 1.20: kubernetes/kubernetes@9fcede9 |
v1 was added in 1.20 kubernetes/kubernetes#96387 |
ACK, definitely no problem with that. We will still need to make sure to note this for the release, and we should consider if this is sufficient excuse to drop other < 1.20 bits from the project. We've been very lenient about dropping support outright so far. And re: other parts: #3828 (comment) |
At some point <1.19 will be widely unusable because of cgroups v2 anyhow. |
b5b1ff6
to
ccd6c2a
Compare
Fix issue 3768 Signed-off-by: Akihiro Suda <[email protected]>
ccd6c2a
to
586b038
Compare
/retest |
/lgtm Thanks |
/lgtm Note: we have automated base image builds after merge, so most of the CI here is using the current base image still. After this merges we will send another PR to pickup the image, it's possible we'll find issues and revert. This is fine, just pointing it out in case. Sometimes we preemptively test by temporarily adding a commit using a locally built & pushed image, but only to test. We don't need to do that here, but we will get more signal when we adopt a built image. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AkihiroSuda, BenTheElder The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for working on this and the upstream bug fixes, as always! |
filed #3848 after the build job completed so we can keep this moving along |
I did find an issue #3848 (comment) We can mitigate, but the underlying cause may be worth investigating. Previously we did not wait for containerd to be ready because it was fully booted in ~0.3s even when cross-architecture, so we never had any issue, now that takes approximately 1s, so image pulls fail because the socket is not available yet. Mitigating that would be pretty trivial (we can either retry image pulls which we should consider anyhow, watch for containerd to be ready before doing any pulls, or both), but that seems like a pretty big regression that will impact startup time at runtime as well (both initially, and anytime we need to restart). Not sure if that's something the containerd project would track or just accept. cc @samuelkarp |
(These times are v1.7.24 vs v2.0.2, arm64 builds on an amd64 GCE machine, more details in #3848 (comment)) |
Update: this is probably not worth discussing upstream, because 2.0.2 is still < 0.07s with amd64 + amd64 host. It is however consistently longer than 1.7.4. Something with arm64 qemu must be even more pathological. |
Moving further discussion to #3848 |
Fix #3768