Skip to content

Commit

Permalink
Added docs
Browse files Browse the repository at this point in the history
  • Loading branch information
shuhaowu committed Mar 26, 2022
1 parent c71d228 commit c43bcda
Show file tree
Hide file tree
Showing 3 changed files with 190 additions and 111 deletions.
116 changes: 5 additions & 111 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ This is a custom image builder for the Raspberry Pi 4. Some features:
Raspberry Pi, so I had to resort to this method.
- The default customization in this repo is made for ROS2 with `PREEMPT_RT`
applied.
- More information about how this works is below.
- With two stages of setup scripts, executing in lock step both inside and
outside the chroot, we can cross compile code (via something like CMake
toolchain) on the host and copy it into the chroot for making the final
Expand Down Expand Up @@ -63,6 +62,8 @@ Thus, you'll need a Linux machine with root and the following tools installed:
`cut`, `grep`, `parted`, `pv`, `rsync`, `truncate`, `wget`, `systemd-nspawn`,
and `qemu-aarch64-static`.

You will also need `python3`.

To build the `focal-rt-ros2` image, you'll also need: `zip`.

For Ubuntu, you can simply run:
Expand All @@ -82,114 +83,7 @@ You can then `dd` this to a SD card.

You can see a demo of this in [CI](https://github.com/shuhaowu/ros-realtime-rpi4-image/actions). CI builds quite slowly. On my computer this whole process only takes a few minutes.

### Debugging build failures if you changed stuff

_This is covered in [How it works](#how-it-works)._

How it works
------------

_Note: the best way to understand the details is to start reading from
`builder/main.sh` and tracing every function call. Most functions are defined
in the order they are called in `builder/core.sh` and thus the code should read
somewhat linearly._

The builder defined in `builder/` is relatively generic and can theoretically
be used for a number of situations where a vendor provides you with a `.img`
file and you need to customize it. It follows these major steps:

1. Reads a custom `vars.sh` to populate all the variables needed to build.
2. Download the (ubuntu) image from the vendor.
3. Extract the image via a custom bash function defined in `vars.sh` to handle
different compression algorithms.
4. Increase the size of the image and resize the file system, as the vendor
image may be too small to install a large number of images.
5. Setup a loop device.
6. Mount the file systems in the loop device so they're accessible by the host.
7. Copy things like resolv.conf and qemu-user-static into the mounted FS.
8. Copy the rootfs overlay defined via `ROOTFS_OVERLAY` into the mounted FS.
- If you have setup scripts running inside the chroot, or other files
supporting the setup, put it here and copy it in.
9. Run the _host-side phase1 setup script_ (`setup_script_phase1_outside_chroot`).
- Variables `export`ed by `main.sh` and `vars.sh` are usable in this scripts,
as well as the other user-defined scripts below.
- For the RT setup, the RT kernel is downloaded from Github in this step.
10. Run the _chroot-side phase1 setup script_ (`setup_script_phase1_inside_chroot`).
This runs the script inside the mounted FS via a chroot in the target
architecture via qemu-user-static.
- This is where you install things like the ROS2 packages.
11. Run the _host-side phase2 setup script_ (`setup_script_phase2_outside_chroot`).
- If you want to cross compile, this is your chance, as the rootfs is readily
available and mounted. In CMake, you can set your `CMAKE_FIND_ROOT_PATH` and
`CMAKE_SYSROOT` to the path to the mounted FS.
12. Run the _chroot-side phase2 setup script_ (`setup_script_phase2_inside_chroot`).
This also runs the script inside the mounted FS.
- Usually this is when you remove any setup files you copied in the chroot
and perform some cleanup. resolv.conf and other files setup by the builder
will be cleaned by the builder, tho
13. Cleanup the chroot by removing resolv.conf and qemu-user-static.
14. Unmount everything and get rid of the loop back device.

To customize this process, look at the `vars.sh` file(s) in this repo. There
should be extensive comments there.

In the future, maybe it's better to figure out how Canonical generate their
official Ubuntu images. However, I can't find how they built their images when
I looked. It is likely that a number of steps here will be needed anyway, so
the existing structure is not that bad of an idea.

### How interrupt and resume works

Each step defined in the code is resumable (see `main.sh`). If they fail, the
builder won't clean everything up to give the developer a chance to debug and
possibly fix things manually before continuing. The workflow I take is as
follows:

1. Edit the build scripts. Run them.
2. Encounter an error.
3. Manually go into the chroot via systemd-nspawn.
4. Figure out the right commands to run and change the script.
5. Change `cache/session.txt` to make sure I can resume from the right spot (by removing and adding steps into the file, see below).
6. Run the builder again (just `make`, or `builder/main.sh ...`).

While this is not perfect, it still saves a lot of time, as you don't always
have to restart from the beginning when encountering an error. The way this
works is via two files saved in `cache`:

- `cache/session.txt`: contains a list of steps executed, one per line. You can
freely change this file if you know what you're doing to selectively
execute/skip steps when working with this system. See `builder/main.sh` for
the steps (`run_step <step_name>`).
- If this file is removed, then the builder will restart from scratch.
- `cache/session-loop-device.txt`: This saves the loop device the chroot is
mounted to, since the loop devices may be different each time we run this.

There's also a `PAUSE_AFTER` variable that can be set in `vars.sh` to instruct
the builder to stop after a particular step. This allow you to do some
interactive experimentation, which also speeds things up.

To get into the chroot to experiment, run the command:

```
$ sudo systemd-nspawn -D /tmp/rpi4-image-build/ bash # the path is whatever CHROOT_PATH is
```

### Interrupt and resume if you change the setup scripts for inside the chroot

Sometimes you will change the scripts running inside the chroot and then resume.
You'll find this doesn't work, because the script you're changing is not copied
into the chroot. To get around this problem and resume, simply delete the
step `copy_files_to_chroot` in `cache/session.txt` and rerun the builder. The
builder will then copy all the files into the chroot again and continue from
where it failed.

### How to reset your host system if something horribly goes wrong

- Try running `./scripts/cleanup.sh`
- Try running the commands in `umount_everything` manually (see
`builder/core.sh`).
- Can do this by adding every step in `builder/main.sh` into
`cache/session.txt` except the umount everything step.
- TODO: I should create simpler command to run this step only.
- If that doesn't work, try restarting your computer :(.
Customization guide
-------------------

See [`docs/BuilderDesignAndUsageGuide.md`](docs/BuilderDesignAndUsageGuide.md).
178 changes: 178 additions & 0 deletions docs/BuilderDesignAndUsageGuide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
Image builder design and usage guide
====================================

Before reading this, please make sure you read the README of the project.

Overview
--------

The image builder's main workflow consists of the following few steps (see
`Builder.build()` in the code for more up-to-date details):

- Download and extract the vendor image.
- Mount the image via loop devices locally.
- Setup a "container" via `systemd-nspawn`. For images with foreign
architecture (such as aarch64), qemu-user-static is copied to the mounts.
- Copy configuration files to the image.
- Run any setup scripts both outside the container and inside the container to
customize the image (such as installing packages, or cross compiling).
- Cleanup the temporary files copied into the container and unmount everything.

This process is generic for most single board computers (SBC) where a flashable
image is provided by the vendor. The image builder allows one to configure this
process with by writing a set of well-defined ["build profiles
files"](#build-profile-format) without having to worry about book-keeping
tasks like mounting the image. This in principle should allow the builder to
build images not just for the Raspberry Pi, but for other SBCs as well.
Further, the design allow the build profile files to be overlaid on top
of each other, allowing for a common base profile to be built into
multiple image types. For example, we can use a common real-time Ubuntu base to
build images with different ROS2 distros without having to duplicate too much
code.

The builder also has time-saving features such as the ability to [pause and
resume](#pause-and-resume) at each step. For example, if one of the custom
setup scripts being executed fails, the builder will not cleanup right away.
This allows the developer to go into the container and experiment. In my
experience, this drastically cuts down on the development time for this kind of
image-building work. See [here for tips and tricks on how to debug and work on
this setup in a time-efficient manner](#tips-and-tricks).

Build profile format
--------------------------

The build profile is defined as a directory in which contains the following
structure:

- `config.ini`: Specifies the [build config](#configini).
- `scripts/`
- `extract-image`: This script extracts the image downloaded. It's not
easy to generically infer how to extract an image, so this is parameterized
as a script.
- `loop-device-setup`: This script performs setup on the host against the
loop device after it is setup by the builder. A common list of operations
done here is to `fsck` the file system of the rootfs and resize it to the
maximum allocated size as per config.ini via `resize2fs`.
- `phase1-host`: Optional. This scripts runs on the host machine immediately
after the image is mounted and ready to go.
- `phase1-target`: Optional. This script runs in the image via
`systemd-nspawn` after the `phase1-host`. For example, you can install
packages in the image by calling `apt`.
- `phase2-host`: Optional. This scripts runs on the host after
`phase1-target`. An example usage of this would be to perform cross
compilation.
- `phase2-target`: Optional. This script runs in the image via
`systemd-nspawn` after `phase2-host`.
- `rootfs/`: Any files and directory within this directory will be copied to
the root of the image being built. The copy of files occur before any of the
phase1/phase2 scripts are executed.

A full example of this directory can be seen in [`focal-rt`](../focal-rt).

### `config.ini`

This file specifies the build profile under the section `[build]`. This section
contains the following variables:

- `debug`: boolean. If true, verbose logging will be turned on.
- `image_url`: string. The URL of the image to be downloaded by the builder.
- `image_mounts`: string. A comma separated list of the partition's mount
point. Mounting occurs in the reverse order by the builder, because usually
the / mount point is the last partition and it needs to be mounted first.
- `image_size`: string. This is passed to `truncate --size=<image_size>
<path to .img file>` and is needed because sometimes the vendor image is too
small.
- `output_filename`: string. The output image file name (not path, just the
file name).

There's another section in this file, `[env]`, which are a list of environment
variables that will be passed to [all scripts](#phase-1-and-phase-2-scripts)
called during the build process.

### Phase 1 and Phase 2 scripts

As noted above, there are 4 scripts that are called. These scripts are called
with the environment variables specified in `config.ini` and a few additional
variables:

- `CHROOT_PATH`: the path of the mounted image seen from the host.
- `OUT_DIR`: the directory of the output file.
- `CACHE_DIR`: a place to put cached data.

Typically, the scripts may be structured as follows:

1. `phase1-host`: downloads additional dependencies from the internet and
copies it to the image via `CHROOT_PATH`.
2. `phase1-target`: installs dependencies within the image.
3. `phase2-host`: Perform cross compilation with `$CHROOT_PATH` as the sysroot.
4. `phase2-target`: Perform any additional setup is needed after the cross
compilation has been installed.

### Overlaying multiple build profiles

The builder is designed for the layering of build profiles. This works by
telling the builder to build with a list of profile directories. The builder
will then:

- Merge the variables inside `config.ini`, where variables from later profiles
overrides the ones from earlier ones.
- Copy `rootfs` files from each build profile in sequential order. Later
profiles can override files from earlier ones.
- Run the phase1/phase2 scripts in sequential order. For example, when running
the phase1 host scripts, the builder will run the script from the first
profile, then the second, then the third, and so on.

Pause and resume
----------------

The builder can pause after each step defined in the code (see
`Builder.build()`). If a step fails, the builder won't cleanup anything to give
the developer a chance to debug and fix things manually before continuing. The
way this works is with two files:

- `cache/session.txt`: contains a list of steps executed, one per line. You can
freely change this file if you know what you're doing to selectively
execute/skip steps when working with this system.
- If this file is removed, then the builder will restart from scratch.
- `cache/session-loop-device.txt`: This saves the loop device the chroot is
mounted to, since the loop devices may be different each time we run this.

Tips and tricks
---------------

If you ever used tools like Ansible, or Packer, you will know that a lot of the
times, to fix a simple typo, you will have to wait for the entire script to run
from the beginning, taking up valuable time. With the ability to pause and
resume, the workflow I take with this repo is as follows:

1. Edit the build scripts. Run them.
2. Encounter an error.
3. Manually go into the chroot via systemd-nspawn.
4. Figure out the right commands to run and change the script.
5. Change `cache/session.txt` to make sure I can resume from the right spot (by
removing and adding steps into the file, see below).
6. Run the builder again.
7. When all the problems are worked out, run the builder from the beginning to
ensure it works for a fresh build.

While this is not a perfect process, it can save a lot of time.

To get into the chroot to experiment, run the command:

```
$ sudo systemd-nspawn -D /tmp/rpi4-image-build/ bash # the path is whatever CHROOT_PATH is
```

At some point, we can refactor the command above into the builder directly.

### How to reset your host system if something horribly goes wrong

THIS IS OUTDATED and the functionality need to be restored.

- Try running `./scripts/cleanup.sh`
- Try running the commands in `umount_everything` manually (see
`builder/core.sh`).
- Can do this by adding every step in `builder/main.sh` into
`cache/session.txt` except the umount everything step.
- TODO: I should create simpler command to run this step only.
- If that doesn't work, try restarting your computer :(.
7 changes: 7 additions & 0 deletions focal-rt/config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,16 @@ image_size = 4G
# The filename of the output image
output_filename = ubuntu-20.04.4-rt-arm64+raspi.img

# TODO: this shouldn't really be a part of the build configuration, because it
# is more like a host-level configuration. Instead of putting it here, it should
# be passed in as an argument to the builder. This refactor should be done when
# build.py becomes a real command-line utility.
#
# The host path to the qemu-user-static binary required for the above image
qemu_user_static_path = /usr/bin/qemu-aarch64-static

# TODO: same as qemu_user_static_path. This should be specified via
# command-line arguments as opposed to build configuration.
# Uncomment this if you want to pause the builder after a particular stage to
# debug/experiment.
# pause_after = cleanup_chroot
Expand Down

0 comments on commit c43bcda

Please sign in to comment.