Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right way to boot from vanilla ZFS pool placed on entire disk? #2028

Closed
seletskiy opened this issue Jan 5, 2014 · 17 comments
Closed

Right way to boot from vanilla ZFS pool placed on entire disk? #2028

seletskiy opened this issue Jan 5, 2014 · 17 comments
Labels
Component: GRUB GRUB integration Type: Documentation Indicates a requested change to the documentation

Comments

@seletskiy
Copy link
Contributor

Not an issue here, but a question.

Let's consider we have ZFS pool zroot consisting of one vdev, which is an entire HDD:

# zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

    NAME                         STATE     READ WRITE CKSUM
    zroot                        ONLINE       0     0     0
      ata-QEMU_HARDDISK_QM00001  ONLINE       0     0     0

# ls -al /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001
lrwxrwxrwx 1 root root 9 Jan  5 18:37 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 -> ../../sda

All OS files are located on that pool, so as /boot directory, and now we need to install a bootloader (GRUB) on that single HDD, nothing special. After some investigations I've figured out how to use GRUB on that configuration, but it is kinda buggy (zfsonlinux/grub#5).

Partition table of /dev/sda is shown below:

# parted
(parted) print
GNU Parted 3.1
Using /dev/sda
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  4286MB  4284MB               zfs
 9      4286MB  4294MB  8389kB

As I understand, GRUB needs a partition to be installed to. I've found how to install GRUB on GPT (http://www.wensley.org.uk/gpt), so after pool creating we can install it onto 1 or 9 partition, which is, I think, extremely wrong, because of listed above is not a partitions but a special marks which ZFS uses to store pool data. So if I install GRUB either on 1 or 9 this GRUB data can be easily overwritten by ZFS.

But there is some "free" space in the beginning and the end of the device (which on have size of ~1MB), which can be used to write GRUB. So, I've created a new 1MB partition in the very beginning of the disk and write GRUB onto. Like that:

(parted) print                                                            
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 4295MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name  Flags
 2      17.4kB  1049kB  1031kB                     bios_grub
 1      1049kB  4286MB  4284MB               zfs
 9      4286MB  4294MB  8389kB

(parted) quit

# grub-install /dev/sda

It makes GRUB recognize partition to install to and it will successfully read kernel from ZFS on the boot time.

All installation guides (even Ubuntu one) that I can find proposes that disk will be partitioned into DOS table contained boot partition on ext filesystem and pool partition contains pool itself (BTW, will performance drop down in that case? when pool is created from a partition, not an entire device?). This is ugly, I think, because of if we need a mirror or raid pool, we should divide every disk into same size partitions, then union all boot partitions (ext) into mdraid device (ugly part), then union all second partitions into ZFS pool.

I somehow managed that it is possible to boot entirely from a GPT and ZFS pool, but I'm not sure how to install GRUB properly in that case. So, my questions are: what is right way to boot from ZFS pool consisting of entire devices, not a partitions? Is booting from GPT as I shown before is reliable one? Is there performance trade-offs when pool is created from a partition, not an entire device? Is it even worth it or should I follow numerous guides and be "happy" with it?

Thanks for attention and sorry for such long reading.

@aarcane
Copy link

aarcane commented Jan 5, 2014

read the rlaager zfs guide for more details on the setup you're using

https://github.com/rlaager/zfs/wiki/HOWTO-install-Ubuntu-to-a-Native-ZFS-Root-Filesystem

On Sun, Jan 5, 2014 at 11:35 AM, Stanislav Seletskiy <
[email protected]> wrote:

Not an issue here, but a question.

Let's consider we have ZFS pool zroot consisting of one vdev, which is an
entire HDD:

zpool status

pool: zroot
state: ONLINE
scan: none requested
config:

NAME                         STATE     READ WRITE CKSUM
zroot                        ONLINE       0     0     0
  ata-QEMU_HARDDISK_QM00001  ONLINE       0     0     0

ls -al /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001

lrwxrwxrwx 1 root root 9 Jan 5 18:37 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 -> ../../sda

All OS files are located on that pool, so as /boot directory, and now we
need to install a bootloader (GRUB) on that single HDD, nothing special.
After some investigations I've figured out how to use GRUB on that
configuration, but it is kinda buggy (zfsonlinux/grub#5zfsonlinux/grub#5
).

Partition table of /dev/sda is shown below:

parted

(parted) print
GNU Parted 3.1
Using /dev/sda
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 4286MB 4284MB zfs
9 4286MB 4294MB 8389kB

As I understand, GRUB needs a partition to be installed to. I've found how
to install GRUB on GPT (http://www.wensley.org.uk/gpt), so after pool
creating we can install it onto 1 or 9 partition, which is, I think,
extremely wrong, because of listed above is not a partitions but a special
marks which ZFS uses to store pool data. So if I install GRUB either on 1
or 9 this GRUB data can be easily overwritten by ZFS.

But there is some "free" space in the beginning and the end of the device
(which on have size of ~1MB), which can be used to write GRUB. So, I've
created a new 1MB partition in the very beginning of the disk and write
GRUB onto. Like that:

(parted) print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 4295MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
2 17.4kB 1049kB 1031kB bios_grub
1 1049kB 4286MB 4284MB zfs
9 4286MB 4294MB 8389kB

(parted) quit

grub-install /dev/sda

It makes GRUB recognize partition to install to and it will successfully
read kernel from ZFS on the boot time.

All installation guides (even Ubuntu one) that I can find proposes that
disk will be partitioned into DOS table contained boot partition on ext
filesystem and pool partition contains pool itself (BTW, will performance
drop down in that case? when pool is created from a partition, not an
entire device?). This is ugly, I think, because of if we need a mirror or
raid pool, we should divide every disk into same size partitions, then
union all boot partitions (ext) into mdraid device (ugly part), then union
all second partitions into ZFS pool.

I somehow managed that it is possible to boot entirely from a GPT and ZFS
pool, but I'm not sure how to install GRUB properly in that case. So, my
questions are: what is right way to boot from ZFS pool consisting of entire
devices, not a partitions? Is booting from GPT as I shown before is
reliable one? Is there performance trade-offs when pool is created from a
partition, not an entire device? Is it even worth it or should I follow
numerous guides and be "happy" with it?

Thanks for attention and sorry for such long reading.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2028
.

@seletskiy
Copy link
Contributor Author

@aarcane: Thank, didn't see that guide. It's almost how did I do this stuff. Anyway, what's preferred way to use ZFS: use separate boot partition or install GRUB into spare space on GPT table?

@aarcane
Copy link

aarcane commented Jan 8, 2014

On Jan 8, 2014 6:21 AM, "Stanislav Seletskiy" [email protected]
wrote:

@aarcane: Thank, didn't see that guide. It's almost how did I do this
stuff. Anyway, what's preferred way to use ZFS: use separate boot partition
or install GRUB into spare space on GPT table?

There really is no single preferred way. It comes down to personal
preference. I prefer the gpt method, others prefer the /boot partition
method. Neither is better or worse.


Reply to this email directly or view it on GitHub.

@seletskiy
Copy link
Contributor Author

@aarcane: So, there is no possible performance flaw or any other drawbacks while using ZFS pool built on top of DOS partition?

@aarcane
Copy link

aarcane commented Jan 8, 2014

Each has minor impacts, but they're more to convenience than anything else.
When using partitions, you have to be sure to use the Linux noop disk
scheduler manually.
When using no /boot partition, you can't update the grub-env file, which
means no auto fail and no "boot to previous os" support. Neither
constitutes a performance penalty.

Some people believe that using zfsonlinux without a /boot partition is more
likely to fail on update, but I've never seen any evidence or heard any
reports that it's more likely than any other zfs boot problems. The main
problem with zfsboot is the initramfs, which is problematic regardless of
where /boot is.
On Jan 8, 2014 8:58 AM, "Stanislav Seletskiy" [email protected]
wrote:

@aarcane https://github.com/aarcane: So, there is no possible
performance flaw or any other drawbacks while using ZFS pool built on top
of DOS partition?


Reply to this email directly or view it on GitHubhttps://github.com//issues/2028#issuecomment-31853600
.

@FransUrbo
Copy link
Contributor

The main problem with zfsboot is the initramfs, which is problematic regardless of where /boot is.

My understanding of this (I've done all possible variants, included encrypted /boot and/or root) that using grub on GPT is a hassle...

I'm using /boot on an external USB stick (which isn't mounted at all, other than manually when I need to), but everything else now on ZFS.

My point of this was to put the encryption keys on that and then ... 'hide' it once it's booted :). I haven't really gotten that far yet, but still.

@aarcane
Copy link

aarcane commented Jan 8, 2014

Slightly off topic, how did you achieve encrypted /boot? That's a goal of
mine!
On Jan 8, 2014 9:58 AM, "Turbo Fredriksson" [email protected]
wrote:

On Jan 8, 2014, at 6:46 PM, Christ Schlacta wrote:

The main
problem with zfsboot is the initramfs, which is problematic regardless
of
where /boot is.

My understanding of this (I've done all possible variants, included
encrypted /boot and/or root) that using
grub on GPT is a hastle...

I'm using /boot on an external USB stick (which isn't mounted at all,
other than manually when I need to), but
everything else now on ZFS.

My point of this was to put the encryption keys on that and then ...
'hide' it once it's booted :). I haven't really

gotten that far yet, but still.

Life sucks and then you die


Reply to this email directly or view it on GitHubhttps://github.com//issues/2028#issuecomment-31860100
.

@FransUrbo
Copy link
Contributor

@aarcane Well, I am using the ZFS-Crypto patch by ZFSRogue and the zfs-initram package from the ZoL Debian GNU/Linux Wheezy repository takes care of copying the relevant modules, keys etc to the initrd. The problem was for the boot process to find the key. That, unfortunately, needs to be in a non-encrypted place at the moment (which in some degree makes an encrypted /boot pointless I guess).

Last week I did however boot on an encrypted version of my root fs (using my USB stick as storage for the initrd which had the crypto key).

Using DM crypt should be easier, because that apparently have the option to 'embed' the key somehow/somewhere - I only took a cursory glance at that while I was trying to figure out how to do encrypted /boot.

@aarcane
Copy link

aarcane commented Jan 8, 2014

Oh. Not impressive. I thought you had mastered the elusive "nothing
unencrypted except grub" ideal
On Jan 8, 2014 10:18 AM, "Turbo Fredriksson" [email protected]
wrote:

@aarcane https://github.com/aarcane Well, I am using the ZFS-Crypto
patch by ZFSRogue and the zfs-initram package from the ZoL Debian GNU/Linux
Wheezy repository takes care of copying the relevant modules, keys etc to
the initrd. The problem was for the boot process to find the key. That,
unfortunately, needs to be in a non-encrypted place at the moment (which in
some degree makes an encrypted /boot pointless I guess).

Last week I did however boot on an encrypted version of my root fs (using
my USB stick as storage for the initrd which had the crypto key).

Using DM crypt should be easier, because that apparently have the option
to 'embed' the key somehow/somewhere - I only took a cursory glance at that
while I was trying to figure out how to do encrypted /boot.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2028#issuecomment-31862034
.

@FransUrbo
Copy link
Contributor

Oh. Not impressive. I thought you had mastered the elusive "nothing unencrypted except grub" ideal

Ah, no. Sorry. That was the ultimate goal of course, but I could find no way to embed the key.

But as I said, this should be possible with dmcrypt...

@seletskiy
Copy link
Contributor Author

@aarcane: Thanks for answer! I'm using ZFS on kinda production servers, so there simply no need to auto fallback, because of all system updates are performed manually. BTW, what is the "initramfs issue" you're talking about?

@FransUrbo: I'm event afraid to imagine how to boot dozen of servers using USB sticks.

@aarcane
Copy link

aarcane commented Jan 9, 2014

On Wed, Jan 8, 2014 at 9:42 PM, Stanislav Seletskiy <
[email protected]> wrote:

@aarcane https://github.com/aarcane: Thanks for answer! I'm using ZFS
on kinda production servers, so there simply no need to auto fallback,
because of all system updates are performed manually. BTW, what is the
"initramfs issue" you're talking about?

On every kernel or zfs upgrade, dkms triggers a purge and rebuild,
including purging modules build with old versions for old kernels, and also
an initramfs update.
If the dkms install fails in any way, or the initramfs fails in any way
(such as a newer version than zfs is built upon being installed, or
something broken in the compiler, or any number of other small but
managable failure modes), you're left with a system that is both unbootable
and without any working recovery mechanism.

There is currently no way to tell dkms and initramfs hooks to not touch
older kernel versions, or to not purge old modules, etc.. So if this
breaks for one kernel, it's likely to break for all kernels. It's a known
problem, and the devs are aware of it, but there's currently no known,
best-practices solution other than to manually make sure dkms install has
run on each relevant system, and each initramfs has been built by hand
before reboot with all appropriate versions of relevant tools. If I see
any update for dkms, initramfs*, zfs, spl, or linux-image, I am certain to
verify relevant versions, check dkms (dkms status), rebuild initramfs
(update-initramfs -c -k all), and update grub (update-grub). All this just
to be sure my system has the best chance possible to come up after any
upgrade.

You have to do all of these checks whether you're using a /boot partition
or not. All the update-grub step is for is to be sure that you've not
failed to add the new initramfs and kernel to grub entries, which should
happen automatically, but I always do it after update-initramfs just to be
safe.

@seletskiy
Copy link
Contributor Author

@aarcane: Ah, that kind of issue. I'm using Arch and usually there is no such problem, because of there is always recent stable zfs package in repo that built against latest available kernel. Anyway, system can be recovered by booting from live cd and downgrading kernel+zfs packages.

@maverick1601
Copy link

I am using the exact same setup without /boot partition, i.e. a single root pool with a filesystem for my Ubuntu root fs. However, I put /boot/grub in a separate filesystem, so I have access across all clones of my root filesystem. The downside of this approach is that grub-update doesn't recognize other clones of your root, so it messes up your grub.cfg. However, one can easily roll back and manage entries manually or completely disable the update-grub hook.

The main advantage of this approach, is that your system always stays bootable, because you are referring to cloned snapshots of your root filesystem for loading the kernel and initramfs. So in case something goes wrong, when updating, I simply go back to an older clone. Apart from that, I always have an early non-UI (non X) clone, for emergency fallback. The only disadvantage for me is the increased space consumed, but that's up to you, how many snapshots and clones you wanna keep.

For every major distribution upgrade, I would always want to boot into a cloned root fs first. Be sure, to not put /var and the like into separate filesystems, as often suggested.

This approach is adapted from Solaris boot environments, where beadm managed those for you. Works pretty well.

@aarcane
Copy link

aarcane commented Jan 10, 2014

For every major distribution upgrade, I would always want to boot into a
cloned root fs first.
If I could update a clone, then boot into the updated clone once the update
complete like Solaris did, that would ideal.
Be sure, to not put /var and the like into separate filesystems, as often
suggested.

Parts of /var belong on /, other parts belong on their own partition. I
have observed that the rlaager guide sets some good defaults for an easy
and functional start.

This approach is adapted from Solaris boot environments, where beadm
managed those for you. Works pretty well.

There was at one point a fairly popular Solaris based distro that used apt
for the package manager and had beadm actually integrated into the dpkg
back-end somehow. If we could find that package, I don't imagine it would
be difficult for Someone in the know to back port changes to Debian and
Ubuntu apt.

@maverick1601
Copy link

  1. I do not necessarily need to update the clone directly. It is fine for me to reboot into it and then perform the update.
  2. The parts in /var that are really variable should be kept as separate file systems, that is right.
  3. The apt-based distro could have been Nexenta Core, the base distro which is also used in the commercial Nexenta Storage solution.

@toobuntu
Copy link

@aarcane I think what you describe is the apt-clone [0,1] utility from Nexenta--not to be confused with the much more recent package apt-clone [2] in debian and ubuntu which serves a different purpose but was given the same name out of ignorance of Nexenta's work. apt-clone was distributed as part of the apt package, and is a perl script wrapper for apt-get that creates a clone of the current root and then proceeds with package action. Usage is identical to apt-get. Ubuntu has apt-btrfs-snapshot, which could be interesting to compare. Perhaps call whatever results apt-zfs-shapshot.
[0] http://apt.nexenta.org/wip/dists/unstable/main/source/admin/apt_0.8.0nexenta8.tar.gz, a dead link, so I pasted the perl script at http://pastebin.com/ppuweWGe (expires 16-Feb-2014)
[1] See http://lwn.net/Articles/334756/
[2] https://launchpad.net/ubuntu/+source/apt-clone (a utility to clone the packages-state of a system and restore it on another system)

DilOS claims to "contain modified APT and DPKG tools for better works with zones and ZFS features." I have not browsed the DilOS source code, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: GRUB GRUB integration Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

No branches or pull requests

6 participants