Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass network firewall driver in use to the runtime for checkpoint/restore #24799

Closed
danishprakash opened this issue Dec 9, 2024 · 9 comments
Closed
Labels
network Networking related issue or feature stale-issue

Comments

@danishprakash
Copy link
Contributor

Netavark supports nftables out of the box and it has also been set as default conditionally at least for fedora 41. At the same time, criu supports --network-lock to allow users to set the firewall driver of choice but defaults to iptables to perform network [un]locking. But which firewall driver is being used isn't being passed onto criu currently and so, if you build netavark with nftables on a system that doesn't have iptables installed, upon trying to checkpoint/restore a container, criu tries to use iptables utils and fails.

--network-lock support has been added to crun, and I'm testing the same changes for runc. What remains is to pass the firewall driver to the runtime from podman, @Luap99 on IRC mentioned that it's possible to get this info but it might not be straightforward, can we discussion possible solutions?


The issue was caught during [open]QA when we started building netavark with nftables by default in Tumbleweed and removed our iptables dependency from podman in the process.

@Luap99
Copy link
Member

Luap99 commented Dec 9, 2024

I not sure how checkpoint restore comes into play here. The netns inside does not contain any of our firewall rules so there is no reason whatsoever for criu to even attempt to store them. Same for the interfaces.
The entire netns is managed by podman not the oci runtime so they should never checkpoint the netns to begin with as podman will correctly restore a netns with the same ip address.

I filled containers/crun#1210 a while ago for that and AFAIK runc already does not checkpoint the netns so there it should just work.
With crun it already works for normal containers just not the ones using a user namespace.


Logically speaking I don't think it makes sense for podman to pass this info down as we do not want the netns to be checkpointed.
Second what is the exact error you are seeing? IMO criu should not hard fail if iptables is not installed which is what I also wrote on another bug where I think the same issue was brought up: https://issues.redhat.com/browse/RHEL-58354?

@Luap99 Luap99 added the network Networking related issue or feature label Dec 9, 2024
@danishprakash
Copy link
Contributor Author

I not sure how checkpoint restore comes into play here. The netns inside does not contain any of our firewall rules so there is no reason whatsoever for criu to even attempt to store them.

I believe this is an issue because criu tries to do the "right" thing and drops all packets for a checkpointed container, so it might very well be that criu isn't checkpointing the netns but still attempting to drop packets for a checkpionted container.

IMO criu should not hard fail if iptables is not installed which is what I also wrote on another bug where I think the same issue was brought up: issues.redhat.com/browse/RHEL-58354?

FWIW, criu allows "skipping" [un]locking the network if you pass --network-lock skip opt. But as things stand right now, the default is iptables and criu tries to use certain utils that aren't installed in case iptables is not installed, which might very well be the case for a lot of distributions.

Either criu could simply default to skipping network locking or podman could implicitly instruct the runtime to "skip" by passing --network-lock skip which could then be propagated to criu. I don't know if users would feel the need for network locking, and to even specify the method for that.

@Luap99
Copy link
Member

Luap99 commented Dec 9, 2024

Ah I see I was assuming it is calling this in the netns but it actually tries to create the rule on the host namesapce!

The issue is there is no easy way for podman to know the firewall driver, we do have /usr/libexec/podman/netavark version which prints the compiled default and then we know the containers.conf option when set so we could use that.
However a netavark firewall driver could also be none or firewalld in which case it no longer maps clearly (I guess none could be mapped to skip) but for firewalld we again would not know if it is configured for nftables or iptables necessarily although I guess nftables would most likely.

We can certainly do that if deemed necessary but I rather avoid that.

Either criu could simply default to skipping network locking or podman could implicitly instruct the runtime to "skip" by passing --network-lock skip which could then be propagated to criu. I don't know if users would feel the need for network locking, and to even specify the method for that.

IMO it likely would be best if criu itself does not hard depend on iptables default. Either let distro packages set the default or have some logic in criu to do if no iptables then nftables (that is if no explicit option was passed). Maybe it is even time for criu to default to nftables.

But yeah I have no idea about user expectation here and how important such rule is. If we start setting skip from podman it means there is a chance for regressions for existing users wanting this.

Also https://criu.org/TCP_connection#Checkpoint_and_restore_TCP_connection

In case the target process lives in NET namespace the connection locking happens the other way. Instead of per-connection iptables rules the "network-lock"/"network-unlock" action scripts are called so that the user could isolate the whole netns from network. Typically this is done by downing the respective veth pair end.

I don't really understand that part. We are using a network namespace so that this mean it should not lock via iptables/nftables in that case?

cc @adrianreber @rst0git Any opinions/suggestions here?

@adrianreber
Copy link
Collaborator

Network locking and containers is a complicated discussion if it is necessary or not. It is necessary for migrating TCP connections at least.

For Fedora we will switch CRIU to the nftables backend so that CRIU will do the right thing. I am not convinced about the new crun option. Does runc have it?

I am not convinced this needs to be an option passed down through the stack. It is kind of the work a distribution does. Making sure the packages work together correctly.

@danishprakash
Copy link
Contributor Author

Either let distro packages set the default or have some logic in criu to do if no iptables then nftables (that is if no explicit option was passed). Maybe it is even time for criu to default to nftables.

Is it possible to do both? I agree with letting the distros setting defaults for packages and having them play well together. At the same time, it'd be nice for criu to have a simple heuristic as mentioned by Paul where it tries the available backends in order and if nothing is available, skip locking altogether?

I am not convinced about the new crun option. Does runc have it?

I'm testing the runc changes currently.

Copy link

github-actions bot commented Jan 9, 2025

A friendly reminder that this issue had no activity for 30 days.

@danishprakash
Copy link
Contributor Author

@adrianreber I'm probably missing something but aren't we using iptables as the default network lock method in criu unless until specified otherwise? We're also building our criu package with nftables support but I don't think that translates to criu simply using nftables as default. When podman tries to c/r, and given that we don't pass the network lock method down the stack presently, criu tends to fall back to iptables.

https://github.com/checkpoint-restore/criu/blob/7eaf43368d67b7c7641ec2f8126db99fb7232d7b/criu/include/cr_options.h#L73

@adrianreber
Copy link
Collaborator

@danishprakash you are right.

Take a look at what I have been doing with CentOS Stream 10 RPMS:

https://gitlab.com/redhat/centos-stream/rpms/criu/-/blob/c10s/network.lock.nftables.patch?ref_type=heads

My plan is to provide an option to select the backend during build time so that it is not necessary to patch the source code.

There have been a couple of important changes in CRIU upstream to correctly work with nftables. Please use them for your package also.

If you have any more questions please reach out to me via email or via a CRIU github issue.

@danishprakash
Copy link
Contributor Author

Thanks for sharing that link. We could carry the same patch for now, but having a build time option, perhaps a makefile arg, would be quite helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
network Networking related issue or feature stale-issue
Projects
None yet
Development

No branches or pull requests

3 participants