Skip to content

Enables streaming of images to and from CRIU during checkpoint/restore with low overhead

Notifications You must be signed in to change notification settings

SquareandCompass/criu-image-streamer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tests

CRIU Image Streamer

criu-image-streamer enables streaming of images to and from CRIU during checkpoint/restore with low overhead.

It enables use of remote storage (e.g., S3, GCS) without buffering in local storage, speeding up operations considerably. Fast checkpointing makes Google's preemptible VM and Amazon Spot VM offerings more attractive: with streaming, CRIU can checkpoint and evacuate even large processes within the tight eviction deadlines (~30secs).

criu-image-streamer includes the following high-level features:

  • Extensible: UNIX pipes are used for image transfers allowing integration in various workloads and environments. One can build fast data pipelines to performing compression, encryption, and remote storage access.

  • Image sharding: When capturing a CRIU image, the image stream can be split into multiple output shards. This helps maximizing the network throughput for remote uploads and CPU utilization for compression/encryption.

  • Shard load balancing: When capturing a CRIU image, the throughput of each output shard is independently optimized. If a shard exhibits poor performance (e.g., by hitting a slow disk), traffic is directed to other shards. This is useful for reducing checkpoint tail latency when using many shards.

  • External file embedding: Files that are not CRIU specific can be included in the image. This can be used, for example, to incorporate a file system tarball along with the CRIU image.

  • Low checkpoint overhead: To maximize speed, we modified CRIU to send pipes over its UNIX socket connection to transfer data. This allows the use of the splice() system call for moving data pipe-to-pipe giving us a zero-copy implementation. We measured 0.1 CPUsec/GB of CPU usage and 3 MB of resident memory when capturing a 10 GB application on standard server hardware of 2020.

  • Moderate restore overhead: We measured 1.4 CPUsec/GB of CPU usage and 3 MB of resident memory. In the future, we could switch to a zero-copy implementation to greatly improve performance.

  • Reliable: criu-image-streamer is written in Rust, avoiding common classes of bugs often encountered when using other low-level languages.

Usage

Note: criu-image-streamer requires CRIU version 3.15 or newer.

The CLI interface of criu-image-streamer is the following:

criu-image-streamer [OPTIONS] --images-dir <images-dir> <SUBCOMMAND>

OPTIONS:
    -D, --images-dir <images-dir>           Images directory where the CRIU UNIX socket is created during
                                            streaming operations.
    -s, --shard-fds <shard-fds>...          File descriptors of shards. Multiple fds may be passed as a comma
                                            separated list. Defaults to 0 or 1 depending on the operation.
    -e, --ext-file-fds <ext-file-fds>...    External files to incorporate/extract in/from the image. Format is
                                            filename:fd where filename corresponds to the name of the file, fd
                                            corresponds to the pipe sending or receiving the file content.
                                            Multiple external files may be passed as a comma separated list.
    -p, --progress-fd <progress-fd>         File descriptor where to report progress. Defaults to 2.

    --tcp-listen-remap <ports>...           When serving the image, remap on the fly the TCP listen socket
                                            ports. Format is old_port:new_port. May only be used with the
                                            serve operation. Multiple tcp port remaps may be passed as a comma
                                            separated list.
SUBCOMMANDS:
    capture    Capture a CRIU image
    serve      Serve a captured CRIU image to CRIU
    extract    Extract a captured CRIU image to the specified images_dir

During the capture or serve operations, a UNIX socket is created into the specified images-dir where CRIU can connect to and perform a checkpoint, or restore operation, respectively. That socket is then used to exchange pipes for data transfers. The images directory is not used for storing data when streaming images to and from CRIU. Rather, shards passed via shard-fds, are used to store and retrieve the image data.

The image data flow of each operation is the following:

  • Capture: CRIU → criu-image-streamer → shards
  • Serve: shards → criu-image-streamer → CRIU
  • Extract: shards → criu-image-streamer → images-dir

Example 1: On-the-fly compression to local storage

In this example, we show how to checkpoint/restore an application and compress/decompress its image on-the-fly with the lz4 compressor.

Checkpoint

sleep 10 & # The app to be checkpointed
APP_PID=$!

criu-image-streamer --images-dir /tmp capture | lz4 -f - /tmp/img.lz4 &
criu dump --images-dir /tmp --stream --shell-job --tree $APP_PID

Restore

lz4 -d /tmp/img.lz4 - | criu-image-streamer --images-dir /tmp serve &
criu restore --images-dir /tmp --stream --shell-job

Example 2: Extracting an image to local storage

Extracting a previously captured image to disk can be useful for inspection. Using the extract command extracts the image to disk instead of waiting for CRIU to consume it from memory.

lz4 -d /tmp/img.lz4 - | criu-image-streamer --images-dir output_dir extract

Example 3: Multi-shard upload to the S3 remote storage

When compressing and uploading to S3, parallelism is beneficial both to leverage multiple CPUs for compression, and multiple streams for maximizing network throughput. Parallelism can be achieved by splitting the image stream into multiple shards using the --shard-fds option.

Checkpoint

sleep 10 & # The app to be checkpointed
APP_PID=$!

# The 'exec N>' syntax opens a new file descriptor in bash (not sh, not zsh).
exec 10> >(lz4 - - | aws s3 cp - s3://bucket/img-1.lz4)
exec 11> >(lz4 - - | aws s3 cp - s3://bucket/img-2.lz4)
exec 12> >(lz4 - - | aws s3 cp - s3://bucket/img-3.lz4)

criu-image-streamer --images-dir /tmp --shard-fds 10,11,12 capture &
criu dump --images-dir /tmp --stream --shell-job --tree $APP_PID

Restore

exec 10< <(aws s3 cp s3://bucket/img-1.lz4 - | lz4 -d - -)
exec 11< <(aws s3 cp s3://bucket/img-2.lz4 - | lz4 -d - -)
exec 12< <(aws s3 cp s3://bucket/img-3.lz4 - | lz4 -d - -)

criu-image-streamer --shard-fds 10,11,12 --images-dir /tmp serve &
criu restore --images-dir /tmp --stream --shell-job

Example 4: Incorporating a tarball into the image

Often, we wish to capture the file system along side the CRIU process image. criu-image-streamer can weave in external files via the --ext-file-fds option. In this example, We use tar to archive /scratch and include the tarball into our final image.

Checkpoint

mkdir -p /scratch/app
echo "app data to preserve" > /scratch/app/data

sleep 10 & # The app to be checkpointed
APP_PID=$!

# The 'exec N>' syntax opens a new file descriptor in bash (not sh, not zsh).
exec 20< <(tar -C / -vcpSf - /scratch/app)

criu-image-streamer --images-dir /tmp --ext-file-fds fs.tar:20 capture | lz4 -f - /tmp/img.lz4 &
criu dump --images-dir /tmp --stream --shell-job --tree $APP_PID

Restore

rm -f /scratch/app/data

exec 20> >(tar -C / -vxf - --no-overwrite-dir)

lz4 -d /tmp/img.lz4 - | criu-image-streamer --images-dir /tmp --ext-file-fds fs.tar:20 serve &
criu restore --images-dir /tmp --stream --shell-job

cat /scratch/app/data

Important correctness consideration: We are missing synchronization details in this simplified example. For correctness, we should do the following:

  • On checkpoint, we should start tarring the file system AFTER the application has stopped. Otherwise, we risk a data race leading to data loss.

  • On restore, we should only start CRIU after tar has finished restoring the file system. Otherwise, we risk having CRIU try to access files that are not yet present.

Synchronization

criu-image-streamer emits the following into the progress pipe, helpful for synchronizing operations:

  • During capture it emits the following messages:

    • socket-init\n to report that the UNIX socket is ready for CRIU to connect. At this point, CRIU is safe to be launched for dump.
    • checkpoint-start\n to report that the checkpoint has started. The application is now guaranteed to be in a stopped state. Starting tarring the file system is appropriate.
    • JSON formatted statistics defined below.
  • During restore:

    • JSON formatted statistics defined below.
    • socket-init\n to report that the UNIX socket is ready for CRIU to connect. At this point, CRIU is safe to be launched for restore.

Transfer speed statistics

The progress pipe emits statistics related to shards with the following JSON format. These statistics are helpful to compute transfer speeds. The JSON blob is emitted as a single \n terminated line.

{
  "shards": [
    {
      "size": u64, // Total size of shard in bytes
      "transfer_duration_millis": u128, // Total time to transfer data
    },
    ...
  ]
}

Installation

Build

The Rust toolchain must be installed as a prerequisite. Run make, or use cargo build --release to build the project.

Deploy

Copy the built binary to the destination host. It requires no library except libc. Change kernel settings to the following for optimal performance.

echo 0 > /proc/sys/fs/pipe-user-pages-soft
echo 0 > /proc/sys/fs/pipe-user-pages-hard
echo $((4*1024*1024)) > /proc/sys/fs/pipe-max-size

Note that during checkpointing, pages in the pipe buffers are not consuming memory. Because CRIU uses vmsplice() and criu-image-streamer uses splice(), the content in the pipes are pointing directly to the application memory.

Tests

We provide a test suite located in tests/. You may run it with cargo test -- --test-threads=1, or make test.

To run integration tests, run the CRIU test suite with --stream. For example, run: sudo ./test/zdtm.py run -f h -a --stream in the CRIU project directory.

Limitations

  • Incremental checkpoints are not supported.
  • CLI options must be passed before the capture/serve/extract subcommand.
  • Shards must be UNIX pipes. For regular files support, cat or pv (faster) may be used as a pipe adapter.
  • Using an older Linux kernel can lead to memory corruption. We tested version 4.14.67 from the stable tree, and have seen memory corruption. We tested version 4.14.121 and seen no issues. 4.15.0-1037 is problematic. It appears that this kernel bug fix is the remedy. Run cargo test splice to test if criu-image-streamer is affected by the bug on your platform.

Acknowledgments

License

criu-image-streamer is licensed under the Apache 2.0 license.

About

Enables streaming of images to and from CRIU during checkpoint/restore with low overhead

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 98.3%
  • Makefile 1.2%
  • Shell 0.5%