Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: always use PostConfigureNetNS option #18468

Closed
wants to merge 1 commit into from

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented May 4, 2023

Maintaining two code paths for network setup has been difficult, I had
to deal with many bugs because of that. Often the PostConfigureNetNS was
not as tested. Overall the code has quite a bit of complexity because of
this option. Just look at this commit how much simpler the code now
looks.

The main advantage of this is that we no longer have to test everything
twice, i.e. with userns and without one.

The downside is that mount and netns setup is no longer parallel but I
think this is fine, it didn't seem to make a measurable difference.

To preserve compatibility in case of an downgrade we keep the
PostConfigureNetNS bool and set it always to true.

This turned out to be much more complicated that thought due to our
spaghetti code. The restart=always case is special because we reuse the
netns. But the slirp4netns and rootlessport process are bound to the
conmon process so they have to be restarted. Given the the network is
now setup in completeNetworkSetup() which is always called by init()
which is called in handleRestartPolicy(). Therefore we can get rid of
the special rootless setup function to restart slirp and rootlessport.
Instead we let one single network setup function take care of that which
is used in all cases.

[NO NEW TESTS NEEDED] Existing test should all pass.

Fixes #17965

Does this PR introduce a user-facing change?

None

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 4, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2023
@Luap99 Luap99 force-pushed the PostConfigureNetNS branch from c96d2fa to e7e62aa Compare May 4, 2023 17:48
// PostConfigureNetNS needed when a user namespace is created by an OCI runtime
// if the network namespace is created before the user namespace it will be
// owned by the wrong user namespace.
PostConfigureNetNS bool `json:"postConfigureNetNS"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth leaving this in the struct and setting it unconditionally to true, to preserve compatibility in a downgrade case? Ensuring the bool is still in the DB and set to true guarantees containers created in 4.6 still behave the same if downgraded to 4.5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much do we care about that? I get the point and will add it back. The reason I am asking because I pay zero attention on this when reviewing PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not cared about downgrade in the past.

Maintaining two code paths for network setup has been difficult, I had
to deal with many bugs because of that. Often the PostConfigureNetNS was
not as tested. Overall the code has quite a bit of complexity because of
this option. Just look at this commit how much simpler the code now
looks.

The main advantage of this is that we no longer have to test everything
twice, i.e. with userns and without one.

The downside is that mount and netns setup is no longer parallel but I
think this is fine, it didn't seem to make a measurable difference.

To preserve compatibility in case of an downgrade we keep the
PostConfigureNetNS bool and set it always to true.

This turned out to be much more complicated that thought due to our
spaghetti code. The restart=always case is special because we reuse the
netns. But the slirp4netns and rootlessport process are bound to the
conmon process so they have to be restarted. Given the the network is
now setup in completeNetworkSetup() which is always called by init()
which is called in handleRestartPolicy(). Therefore we can get rid of
the special rootless setup function to restart slirp and rootlessport.
Instead we let one single network setup function take care of that
which is used in all cases.

[NO NEW TESTS NEEDED] Existing test should all pass.

Fixes containers#17965

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99
Copy link
Member Author

Luap99 commented May 9, 2023

I am pretty sure this is blocked on containers/crun#1210

@adrianreber
Copy link
Collaborator

We talked about this in another ticket, but just wanted to mention this also here.

From a checkpoint/restore point of view this seems to be problematic. The whole checkpoint/restore setup has been developed with Podman setting up the network namespace before giving control to runc/crun.

The problem with configuring the network namespace later is that during restore the process will start to run immediately after restore. If the network namespace is not configured this can either fail immediately if established TCP connections are restored (not possible without an IP address) or the restored process will bind to the wrong network device and not function correctly later. I see that the latest version of this PR has the restore case excluded. Just FYI.

@Luap99
Copy link
Member Author

Luap99 commented May 10, 2023

Yes I am aware and special case restore to setup the netns in advance like it does at the moment. This PR is still desirable, just looking at the diff it is a good improvement.

And of course as said this is just not working correctly with user namespaces. The proper ordering must be oci runtime creates ns -> podman configures netns -> criu restores process and starts it. So this is something the oci runtimes have to support.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

Luap99 added a commit to Luap99/libpod that referenced this pull request Jun 21, 2023
My PR[1] to remove PostConfigureNetNS is blocked on other things I want
to split this change out. It reduces the complexity when generating
/etc/hosts and /etc/resolv.conf as now we always write this file after
we setup the network. this means we can get the actual ip from the netns
which is important.

[NO NEW TESTS NEEDED] This is just a rework.

[1] containers#18468

Signed-off-by: Paul Holzinger <[email protected]>
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2023
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions github-actions bot removed the stale-pr label Jun 24, 2023
@rhatdan
Copy link
Member

rhatdan commented Jul 28, 2023

@Luap99 any update on this?

@Luap99
Copy link
Member Author

Luap99 commented Jul 31, 2023

it is still blocked on crun containers/crun#1210

I am going to close this for now

@Luap99 Luap99 closed this Jul 31, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 30, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] libpod: remove PostConfigureNetNS option
5 participants