Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builder refactor with Python and the ability to layer builds #23

Merged
merged 5 commits into from
Mar 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
cache/
out/

__pycache__
*.pyc
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
.PHONY: focal-rt-ros2 clean

# TODO: eventually the build.py should be a command line script that takes
# arguments
focal-rt-ros2:
sudo builder/main.sh focal-rt-ros2/vars.sh
sudo python3 build.py
LanderU marked this conversation as resolved.
Show resolved Hide resolved

clean:
sudo rm -rf out cache
117 changes: 5 additions & 112 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ This is a custom image builder for the Raspberry Pi 4. Some features:
Raspberry Pi, so I had to resort to this method.
- The default customization in this repo is made for ROS2 with `PREEMPT_RT`
applied.
- More information about how this works is below.
- With two stages of setup scripts, executing in lock step both inside and
outside the chroot, we can cross compile code (via something like CMake
toolchain) on the host and copy it into the chroot for making the final
Expand All @@ -39,7 +38,6 @@ This is a custom image builder for the Raspberry Pi 4. Some features:

### Todos

- [ ] Setup /etc/security/limits.conf (or maybe limits.conf.d)
- [ ] Optionally configure isolcpus and nohz_full for the kernel.
- [ ] Fix the issue with `LINUX_RT_VERSION` and `LINUX_RT_VERSION_ACTUALLY` (see `vars.sh`).
- [ ] Use a sha256 checksum to ensure downloaded image and kernel are "secure".
Expand All @@ -64,6 +62,8 @@ Thus, you'll need a Linux machine with root and the following tools installed:
`cut`, `grep`, `parted`, `pv`, `rsync`, `truncate`, `wget`, `systemd-nspawn`,
and `qemu-aarch64-static`.

You will also need `python3`.

To build the `focal-rt-ros2` image, you'll also need: `zip`.

For Ubuntu, you can simply run:
Expand All @@ -83,114 +83,7 @@ You can then `dd` this to a SD card.

You can see a demo of this in [CI](https://github.com/shuhaowu/ros-realtime-rpi4-image/actions). CI builds quite slowly. On my computer this whole process only takes a few minutes.

### Debugging build failures if you changed stuff

_This is covered in [How it works](#how-it-works)._

How it works
------------

_Note: the best way to understand the details is to start reading from
`builder/main.sh` and tracing every function call. Most functions are defined
in the order they are called in `builder/core.sh` and thus the code should read
somewhat linearly._

The builder defined in `builder/` is relatively generic and can theoretically
be used for a number of situations where a vendor provides you with a `.img`
file and you need to customize it. It follows these major steps:

1. Reads a custom `vars.sh` to populate all the variables needed to build.
2. Download the (ubuntu) image from the vendor.
3. Extract the image via a custom bash function defined in `vars.sh` to handle
different compression algorithms.
4. Increase the size of the image and resize the file system, as the vendor
image may be too small to install a large number of images.
5. Setup a loop device.
6. Mount the file systems in the loop device so they're accessible by the host.
7. Copy things like resolv.conf and qemu-user-static into the mounted FS.
8. Copy the rootfs overlay defined via `ROOTFS_OVERLAY` into the mounted FS.
- If you have setup scripts running inside the chroot, or other files
supporting the setup, put it here and copy it in.
9. Run the _host-side phase1 setup script_ (`setup_script_phase1_outside_chroot`).
- Variables `export`ed by `main.sh` and `vars.sh` are usable in this scripts,
as well as the other user-defined scripts below.
- For the RT setup, the RT kernel is downloaded from Github in this step.
10. Run the _chroot-side phase1 setup script_ (`setup_script_phase1_inside_chroot`).
This runs the script inside the mounted FS via a chroot in the target
architecture via qemu-user-static.
- This is where you install things like the ROS2 packages.
11. Run the _host-side phase2 setup script_ (`setup_script_phase2_outside_chroot`).
- If you want to cross compile, this is your chance, as the rootfs is readily
available and mounted. In CMake, you can set your `CMAKE_FIND_ROOT_PATH` and
`CMAKE_SYSROOT` to the path to the mounted FS.
12. Run the _chroot-side phase2 setup script_ (`setup_script_phase2_inside_chroot`).
This also runs the script inside the mounted FS.
- Usually this is when you remove any setup files you copied in the chroot
and perform some cleanup. resolv.conf and other files setup by the builder
will be cleaned by the builder, tho
13. Cleanup the chroot by removing resolv.conf and qemu-user-static.
14. Unmount everything and get rid of the loop back device.

To customize this process, look at the `vars.sh` file(s) in this repo. There
should be extensive comments there.

In the future, maybe it's better to figure out how Canonical generate their
official Ubuntu images. However, I can't find how they built their images when
I looked. It is likely that a number of steps here will be needed anyway, so
the existing structure is not that bad of an idea.

### How interrupt and resume works

Each step defined in the code is resumable (see `main.sh`). If they fail, the
builder won't clean everything up to give the developer a chance to debug and
possibly fix things manually before continuing. The workflow I take is as
follows:

1. Edit the build scripts. Run them.
2. Encounter an error.
3. Manually go into the chroot via systemd-nspawn.
4. Figure out the right commands to run and change the script.
5. Change `cache/session.txt` to make sure I can resume from the right spot (by removing and adding steps into the file, see below).
6. Run the builder again (just `make`, or `builder/main.sh ...`).

While this is not perfect, it still saves a lot of time, as you don't always
have to restart from the beginning when encountering an error. The way this
works is via two files saved in `cache`:

- `cache/session.txt`: contains a list of steps executed, one per line. You can
freely change this file if you know what you're doing to selectively
execute/skip steps when working with this system. See `builder/main.sh` for
the steps (`run_step <step_name>`).
- If this file is removed, then the builder will restart from scratch.
- `cache/session-loop-device.txt`: This saves the loop device the chroot is
mounted to, since the loop devices may be different each time we run this.

There's also a `PAUSE_AFTER` variable that can be set in `vars.sh` to instruct
the builder to stop after a particular step. This allow you to do some
interactive experimentation, which also speeds things up.

To get into the chroot to experiment, run the command:

```
$ sudo systemd-nspawn -D /tmp/rpi4-image-build/ bash # the path is whatever CHROOT_PATH is
```

### Interrupt and resume if you change the setup scripts for inside the chroot

Sometimes you will change the scripts running inside the chroot and then resume.
You'll find this doesn't work, because the script you're changing is not copied
into the chroot. To get around this problem and resume, simply delete the
step `copy_files_to_chroot` in `cache/session.txt` and rerun the builder. The
builder will then copy all the files into the chroot again and continue from
where it failed.

### How to reset your host system if something horribly goes wrong

- Try running `./scripts/cleanup.sh`
- Try running the commands in `umount_everything` manually (see
`builder/core.sh`).
- Can do this by adding every step in `builder/main.sh` into
`cache/session.txt` except the umount everything step.
- TODO: I should create simpler command to run this step only.
- If that doesn't work, try restarting your computer :(.
Customization guide
-------------------

See [`docs/BuilderDesignAndUsageGuide.md`](docs/BuilderDesignAndUsageGuide.md).
14 changes: 14 additions & 0 deletions build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env python3
import logging

from image_builder.builder import Builder

logging.basicConfig(format="[%(asctime)s][%(levelname)s] %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.DEBUG)

b = Builder([
"focal-rt",
"focal-rt-galactic",
])

b.build()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to convert this script into a proper command line app. However, it's not that urgent and it will take a bit more time. I would like to get this merged first so we can iterate on other things that needs to be done.


142 changes: 0 additions & 142 deletions builder/core.sh

This file was deleted.

68 changes: 0 additions & 68 deletions builder/main.sh

This file was deleted.

Loading