Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs trouble on ARM64: segmentation fault #586

Open
samip5 opened this issue Jan 20, 2025 · 6 comments
Open

zfs trouble on ARM64: segmentation fault #586

samip5 opened this issue Jan 20, 2025 · 6 comments

Comments

@samip5
Copy link
Contributor

samip5 commented Jan 20, 2025

This is not great.. How would I even go about debugging this as Talos doesn't properly boot as a result?

Running on Oracle Ampere instance.

user: warning: [2025-01-20T09:56:31.57410882Z]: [talos] [initramfs] enabling system extension zfs 2.2.7-v1.9.2
user: warning: [2025-01-20T09:56:32.18043182Z]: [talos] service[ext-zfs-service](Starting): Starting service
user: warning: [2025-01-20T09:56:32.18533482Z]: [talos] service[ext-zfs-service](Waiting): Waiting for service "containerd" to be "up", service "udevd" to be "up", service "cri" to be "up", file "/dev/zfs" to exist
kern: warning: [2025-01-20T09:56:32.64627282Z]: zfs: module license 'CDDL' taints kernel.
kern: warning: [2025-01-20T09:56:32.65022382Z]: zfs: module license taints kernel.
user: warning: [2025-01-20T09:56:33.19103082Z]: [talos] service[ext-zfs-service](Waiting): Waiting for service "containerd" to be "up", service "udevd" to be "up", service "cri" to be registered, file "/dev/zfs" to exist
kern:  notice: [2025-01-20T09:56:33.27346082Z]: ZFS: Loaded module v2.2.7-1, ZFS pool version 5000, ZFS filesystem version 5
user: warning: [2025-01-20T09:56:34.19160382Z]: [talos] service[ext-zfs-service](Waiting): Waiting for service "cri" to be registered
user: warning: [2025-01-20T09:56:34.96757382Z]: [talos] task startAllServices (1/1): service "apid" to be "up", service "auditd" to be "up", service "containerd" to be "up", service "cri" to be "up", service "etcd" to be "up", service "ext-iscsid" to be "up", service "ext-tgtd" to be "up", service "ext-zfs-service" to be "up", service "kubelet" to be "up", service "machined" to be "up", service "syslogd" to be "up", service "trustd" to be "up", service "udevd" to be "up"
user: warning: [2025-01-20T09:56:35.19158382Z]: [talos] service[ext-zfs-service](Waiting): Waiting for service "cri" to be "up"
user: warning: [2025-01-20T09:56:35.97000382Z]: [talos] service[ext-zfs-service](Preparing): Running pre state
user: warning: [2025-01-20T09:56:35.97765882Z]: [talos] service[ext-zfs-service](Preparing): Creating service runner
user: warning: [2025-01-20T09:56:36.06776182Z]: [talos] service[ext-zfs-service](Running): Started task ext-zfs-service (PID 5315) for container ext-zfs-service
user: warning: [2025-01-20T09:56:36.52519982Z]: [talos] service[ext-zfs-service](Waiting): Error running Containerd(ext-zfs-service), going to restart until it succeeds: task "ext-zfs-service" failed: exit code 1
user: warning: [2025-01-20T09:56:41.59867282Z]: [talos] service[ext-zfs-service](Running): Started task ext-zfs-service (PID 5621) for container ext-zfs-service

talosctl logs ext-zfs-service:

0 / 0 keys successfully loaded
2025/01/20 09:56:36 zfs-service: zpool import error: signal: segmentation fault
no pools available to import
@jfroy
Copy link
Contributor

jfroy commented Jan 20, 2025

This suggests the zpool program is crashing. You can spawn a privileged system pod and try to debug zpool, or try to install zpool in that system pod (using the distro’s package manager) and see if that also crashes. I’ve run zfs commands inside pods created by https://github.com/kvaps/kubectl-node-shell .

@samip5
Copy link
Contributor Author

samip5 commented Jan 20, 2025

The wierd thing is it did manage to mount my pool..

@samip5 samip5 changed the title zfs import fails on ARM64: segmentation fault zfs trouble on ARM64: segmentation fault Jan 22, 2025
@samip5
Copy link
Contributor Author

samip5 commented Jan 22, 2025

The ZFS binary seems to be segfaulting while zpool binary is fine.

@DavidIlie
Copy link

Did you end up finding a solution? I have three Dell R630's and one out of my three nodes is having this same issue when starting up a brand new cluster

@samip5
Copy link
Contributor Author

samip5 commented Jan 31, 2025

Did you end up finding a solution? I have three Dell R630's and one out of my three nodes is having this same issue when starting up a brand new cluster

It managed to mount the pool so I dunno what the problem was about and am able to schedule pods and things.

@simlun
Copy link

simlun commented Feb 2, 2025

Hello. I get the same error from "talosctl logs ext-zfs-service" on a Raspberry Pi 4. Both on Talos 1.9.1 and 1.9.2. Segfaults ain’t fun.

If I could get some pointers I’d love to help out, sharing some logs etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants