Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use --local=false on amd64, to workaround containerd image loading #3805

Closed

Conversation

dgl
Copy link
Contributor

@dgl dgl commented Dec 3, 2024

This is a potential but not pretty workaround for #3795.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 3, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dgl
Once this PR has been reviewed and has the lgtm label, please assign aojea for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 3, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @dgl. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 3, 2024
@aojea
Copy link
Contributor

aojea commented Dec 3, 2024

/assign @BenTheElder

var opt string
// TODO: Hack to workaround #3795.
if runtime.GOARCH == "amd64" {
opt = "--local=false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate a bit on what this is intended to do and why only amd64?

it will be a bit before I can dig into this

Copy link
Contributor Author

@dgl dgl Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug a bit more into the actual issue on the issue thread, which will hopefully answer how to actually fix this in containerd / docker (which I do believe it is actually a bug in one of).

The reason for the arm64 / amd64 split is confusing and what makes this a hack:

  • This is happening with packages where there is an amd64 / 386 mix, hence why targeting amd64 for --local=false.
  • On the version of containerd kind is currently using --all-platforms with --local=false doesn't work.

So the bug has been seen on amd64, people using arm64 do want to mix platforms. It's also possible for arm64 to mix v8 or v9, but I think that's far less common and unlikely to break anyone.

Clearly this should be fixed in containerd (or docker, if it should actually include more blobs in its images), but given a kind release is tied to a containerd release I think it would be safe to include this hack until kind is on a containerd with a fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like much the idea of we carry patching compensation for known bugs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this only on a specific arch, people could mix platforms on other host arches?

but given a kind release is tied to a containerd release I think it would be safe to include this hack until kind is on a containerd with a fix.

... that's not strictly true, users ignore our docs and use node images across kind releases all the time, so we need to at least warn that some minimum version will be required because we require this new flag that isn't present in images from prior releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule we actually permit using them across releases and notify when a release has a compatibility change that requires images built after a certain release or something like that. It hasn't happened recently as well.

@BenTheElder
Copy link
Member

see also: #3828 (comment) (containerd 2.0 and setting --local=true ...)

@BenTheElder
Copy link
Member

So if I followed correctly, after we finish adopting containerd 2.x, we will be using local=false on all platforms, with some upstream bugfixes. I think we should probably prefer that?

@BenTheElder
Copy link
Member

@ HEAD we're using containerd 2.0.2, can you test if that image solves your issue? It should have this by default in containerd 2.x, I think that's the best path forward

Thanks for digging into this and sending the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants