-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpod: always use PostConfigureNetNS option #18468
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
// PostConfigureNetNS needed when a user namespace is created by an OCI runtime | ||
// if the network namespace is created before the user namespace it will be | ||
// owned by the wrong user namespace. | ||
PostConfigureNetNS bool `json:"postConfigureNetNS"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth leaving this in the struct and setting it unconditionally to true, to preserve compatibility in a downgrade case? Ensuring the bool is still in the DB and set to true guarantees containers created in 4.6 still behave the same if downgraded to 4.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much do we care about that? I get the point and will add it back. The reason I am asking because I pay zero attention on this when reviewing PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not cared about downgrade in the past.
7d53fbb
to
9b23c40
Compare
Maintaining two code paths for network setup has been difficult, I had to deal with many bugs because of that. Often the PostConfigureNetNS was not as tested. Overall the code has quite a bit of complexity because of this option. Just look at this commit how much simpler the code now looks. The main advantage of this is that we no longer have to test everything twice, i.e. with userns and without one. The downside is that mount and netns setup is no longer parallel but I think this is fine, it didn't seem to make a measurable difference. To preserve compatibility in case of an downgrade we keep the PostConfigureNetNS bool and set it always to true. This turned out to be much more complicated that thought due to our spaghetti code. The restart=always case is special because we reuse the netns. But the slirp4netns and rootlessport process are bound to the conmon process so they have to be restarted. Given the the network is now setup in completeNetworkSetup() which is always called by init() which is called in handleRestartPolicy(). Therefore we can get rid of the special rootless setup function to restart slirp and rootlessport. Instead we let one single network setup function take care of that which is used in all cases. [NO NEW TESTS NEEDED] Existing test should all pass. Fixes containers#17965 Signed-off-by: Paul Holzinger <[email protected]>
I am pretty sure this is blocked on containers/crun#1210 |
We talked about this in another ticket, but just wanted to mention this also here. From a checkpoint/restore point of view this seems to be problematic. The whole checkpoint/restore setup has been developed with Podman setting up the network namespace before giving control to runc/crun. The problem with configuring the network namespace later is that during restore the process will start to run immediately after restore. If the network namespace is not configured this can either fail immediately if established TCP connections are restored (not possible without an IP address) or the restored process will bind to the wrong network device and not function correctly later. I see that the latest version of this PR has the restore case excluded. Just FYI. |
Yes I am aware and special case restore to setup the netns in advance like it does at the moment. This PR is still desirable, just looking at the diff it is a good improvement. And of course as said this is just not working correctly with user namespaces. The proper ordering must be oci runtime creates ns -> podman configures netns -> criu restores process and starts it. So this is something the oci runtimes have to support. |
A friendly reminder that this PR had no activity for 30 days. |
My PR[1] to remove PostConfigureNetNS is blocked on other things I want to split this change out. It reduces the complexity when generating /etc/hosts and /etc/resolv.conf as now we always write this file after we setup the network. this means we can get the actual ip from the netns which is important. [NO NEW TESTS NEEDED] This is just a rework. [1] containers#18468 Signed-off-by: Paul Holzinger <[email protected]>
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Luap99 any update on this? |
it is still blocked on crun containers/crun#1210 I am going to close this for now |
Maintaining two code paths for network setup has been difficult, I had
to deal with many bugs because of that. Often the PostConfigureNetNS was
not as tested. Overall the code has quite a bit of complexity because of
this option. Just look at this commit how much simpler the code now
looks.
The main advantage of this is that we no longer have to test everything
twice, i.e. with userns and without one.
The downside is that mount and netns setup is no longer parallel but I
think this is fine, it didn't seem to make a measurable difference.
To preserve compatibility in case of an downgrade we keep the
PostConfigureNetNS bool and set it always to true.
This turned out to be much more complicated that thought due to our
spaghetti code. The restart=always case is special because we reuse the
netns. But the slirp4netns and rootlessport process are bound to the
conmon process so they have to be restarted. Given the the network is
now setup in completeNetworkSetup() which is always called by init()
which is called in handleRestartPolicy(). Therefore we can get rid of
the special rootless setup function to restart slirp and rootlessport.
Instead we let one single network setup function take care of that which
is used in all cases.
[NO NEW TESTS NEEDED] Existing test should all pass.
Fixes #17965
Does this PR introduce a user-facing change?