Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
libpod: enable reflink copying for checkpoint tar files.
This eliminates the overhead of handling rootfs diffs. Without these changes, on my personal machine (T480 [email protected], Fedora 41 with BTRFS) the below test takes 29s (checkpoint) and 18s (restore) to complete; with these changes, it is 2.9s (checkpoint) and 2.0s (restore). There are essentially two changes: * set TarOptions.AlignBlockFile for creating snapshot and rootfs-diff tar files. This is necessary for BTRFS to share data blocks when using copy_file_range ("reflink"). * (un)pack the rootfs diff through the /diff layer, rather than the /merged mountpoint, by simply replace "/merged" with "/diff" in the path string. Without this change, the source is on a overlayfs mount, and the destination on BTRFS, and copy_file_range would fail with EXDEV. Replacing the path suffix was done as a hack. There are two ways to do this more cleanly: 1) extend the podman internal APIs so it offers a principled manner to access the layer paths directly. 2) implement cross-FS copy_file_range() in the Linux' overlayfs: overlayfs could dereference the files to their layers, check if the resulting source and destination are on the same filesystem, and then delegate to that filesystems' copy_file_range implementation. This needs extending the VFS definition, so a filesystem can signal that it knows how to do cross FS-type copy_file_range. Currently, this is forbidden through a check in generic_copy_file_checks() in fs/read_write.c. Approach 2) would speed up all of podman's tar handling in one fell swoop, while 1) would also work with older kernel versions, but would require updating all relevant callsites in podman. Timing for checkpointing 5Gb of rootfs-diffs: Command: (cd ../containers/podman/ && go build -tags "selinux seccomp" ./cmd/podman) && sudo podman container rm -f -t0 restored && echo 1 && sudo ~/vc/containers/podman/podman run -dt -p 8080:80/tcp docker.io/library/httpd && echo 2 && sudo ~/vc/containers/podman/podman --log-level=debug container cp files.sh $(sudo ~/vc/containers/podman/podman ps -l --format "{{.ID}}"):files.sh && echo 3 && sudo ~/vc/containers/podman/podman container exec -it -l /bin/sh -x files.sh && echo 4 && sudo /usr/bin/time ~/vc/containers/podman/podman --log-level=debug container checkpoint --compress=none -e foo.tar -l && echo 5 && sudo /usr/bin/time ~/vc/containers/podman/podman --log-level=debug container restore -i foo.tar --name restored files.sh: $ cat files.sh #!/bin/sh dd if=/dev/urandom of=test bs=1M count=100 echo hoi >> test for i in $(seq 1 50); do cp --reflink=never test test$i & done wait rm /usr/bin/znew .. [INFO]:Jan 17 19:44:36.137 - processed getdiff (0 byte) in 423.148002ms: 0.000000 gb/s [INFO]:Jan 17 19:44:36.137 - processed usr/local/apache2/test32 (104857604 byte) in 97.048µs: 1006.267555 gb/s [INFO]:Jan 17 19:44:36.224 - processed usr/local/apache2/test34 (104857604 byte) in 86.024792ms: 1.135211 gb/s [INFO]:Jan 17 19:44:36.224 - processed usr/local/apache2/test49 (104857604 byte) in 83.593µs: 1168.234825 gb/s ... [INFO]:Jan 17 19:44:37.666 - processed checkpoint/tty-info.img (186 byte) in 563.93µs: 0.000307 gb/s [INFO]:Jan 17 19:44:37.667 - processed checkpoint/utsns-12.img (34 byte) in 517.281µs: 0.000061 gb/s [INFO]:Jan 17 19:44:37.670 - processed rootfs-diff.tar (5347956224 byte) in 2.52236ms: 1974.608049 gb/s .. 0.44user 1.66system 0:02.93elapsed 71%CPU (0avgtext+0avgdata 51736maxresident)k 2400inputs+34248outputs (1major+14586minor)pagefaults 0swaps Restore: ... [INFO]:Jan 17 19:44:38.817 - processed checkpoint/utsns-12.img (34 byte) in 278.176µs: 0.000114 gb/s [INFO]:Jan 17 19:44:38.821 - processed rootfs-diff.tar (5347956224 byte) in 2.431477ms: 2048.414342 gb/s ... [DEBUG]:Jan 17 19:44:38.833 - Mounted container "78f739866462dd6a69e596fcd43bcd45409db7176410930a208de4ae862a439f" at "/var/lib/containers/storage/overlay/27da5a8326fef0ca4887eb55297ff7fd6998f1e4b01d9cb2d630e0a4b9938359/merged" ... [INFO]:Jan 17 19:44:39.124 - processed usr/local/apache2/test32 (104857604 byte) in 643.656µs: 151.721189 gb/s ... [INFO]:Jan 17 19:44:39.175 - processed rootfsdiff (5347956224 byte) in 51.784872ms: 96.180065 gb/s ... 0.33user 0.26system 0:02.01elapsed 29%CPU (0avgtext+0avgdata 48336maxresident)k 728inputs+3296outputs (0major+7097minor)pagefaults 0swaps
- Loading branch information