Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No space left on device" error in build container on Ubuntu/LVM system #178

Closed
huebner-m opened this issue Jun 7, 2022 · 3 comments
Closed

Comments

@huebner-m
Copy link
Contributor

When using the build container on an Ubuntu system set up with LVM filesystems, EasyBuild commands result in "No space left on device" errors, see e.g.

== Temporary log file in case of crash /tmp/eb-1dz87d45/easybuild-6qm9ucqp.log
== found valid index for /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/EasyBuild/4.5.1/easybuild/easyconfigs, so using it...
== processing EasyBuild easyconfig /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/EasyBuild/4.5.1/easybuild/easyconfigs/c/CUDA/CUDA-11.3.1.eb
== FAILED: Installation ended unsuccessfully (build directory: /home/mhuebner/.local/easybuild/build/CUDA/11.3.1/system-system): build failed (first 300 chars): Failed to copy file
/cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/EasyBuild/4.5.1/lib/python3.9/site-packages/easybuild/easyblocks/c/cuda.py to 
/tmp/eb-1dz87d45/reprod_20220603144917_236/easyblocks/cuda.py: [Errno 28] No space left on device: '/tmp/eb-1dz87d45/r (took 0 secs)
== Results of the build can be found in the log file(s) /tmp/eb-1dz87d45/easybuild-CUDA-11.3.1-20220603.144917.WLmxe.log

ERROR: Build of /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/EasyBuild/4.5.1/easybuild/easyconfigs/c/CUDA/CUDA-11.3.1.eb failed (err: "build failed (first 300 chars): Failed to copy file /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/EasyBuild/4.5.1/lib/python3.9/site-packages/easybuild/easyblocks/c/cuda.py to /tmp/eb-1dz87d45/reprod_20220603144917_236/easyblocks/cuda.py: [Errno 28] No space left on device: '/tmp/eb-1dz87d45/r")
CUDA installation failed, please check EasyBuild logs...

It's not a disk space/inode issue:

$ df -h /tmp/
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv  196G   72G  115G  39% /tmp
$ df -hi /tmp/
Filesystem                        Inodes IUsed IFree IUse% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv    13M  985K   12M    8% /tmp

and the mount options also seem fine (/tmp is mounted on / on the host):

/dev/mapper/ubuntu--vg-ubuntu--lv on / type ext4 (rw,relatime)

Binding /tmp in the container to other paths did not fix this problem (it has to be noted that the other paths were also set up with LVM).

This error does not occur when I use /dev/shm instead of /tmp as bind mounts for the container (including the /tmp bind). The main difference is that /dev/shm is a tmpfs filesystem:

$ df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            63G  8.3G   55G  14% /dev/shm
$ mount | grep /shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

This suggests to me that there seems to be a strange interplay of Singularity/overlayfs with the LVM filesystem. Has anyone seen something like this in the past?

@trz42
Copy link
Collaborator

trz42 commented Jun 7, 2022

Do you have a sequence of commands to reproduce what you tried to do?

@huebner-m
Copy link
Contributor Author

You should be able to reproduce this behavior with (I noticed this while testing the CUDA support, hence some GPU specific modifications):

# more or less following the instructions in https://eessi.github.io/docs/software_layer/build_nodes/
export EESSI_TMPDIR=/tmp/$USER/EESSI
mkdir -p $EESSI_TMPDIR
mkdir -p $EESSI_TMPDIR/{home,overlay-upper,overlay-work,opt/eessi}
mkdir -p $EESSI_TMPDIR/{var-lib-cvmfs,var-run-cvmfs}

export SINGULARITY_CACHEDIR=$EESSI_TMPDIR/singularity_cache
# link NVIDIA driver and CUDA libs to singularity
libs=$(find /usr/lib/x86_64-linux-gnu/ -path '*libnvidia*' | tr '\n' ',')
libs_cuda=$(find /usr/lib/x86_64-linux-gnu/ -path '*libcuda*' | tr '\n' ',')
export SINGULARITY_BIND="$EESSI_TMPDIR/var-run-cvmfs:/var/run/cvmfs,$EESSI_TMPDIR/var-lib-cvmfs:/var/lib/cvmfs,${EESSI_TMPDIR}/opt:/opt,${libs},${libs_cuda}"
export SINGULARITY_HOME="$EESSI_TMPDIR/home:/home/$USER"
export EESSI_PILOT_READONLY="container:cvmfs2 pilot.eessi-hpc.org /cvmfs_ro/pilot.eessi-hpc.org"
echo $EESSI_TMPDIR/overlay-upper
export EESSI_PILOT_WRITABLE_OVERLAY="container:fuse-overlayfs -o lowerdir=/cvmfs_ro/pilot.eessi-hpc.org -o upperdir=$EESSI_TMPDIR/overlay-upper -o workdir=$EESSI_TMPDIR/overlay-work /cvmfs/pilot.eessi-hpc.org"
echo $EESSI_TMPDIR/overlay-upper
# start the container
singularity shell --nv --fusemount "$EESSI_PILOT_READONLY" --fusemount "$EESSI_PILOT_WRITABLE_OVERLAY" docker://ghcr.io/eessi/build-node:debian10

# inside the container
export EESSI_PILOT_VERSION='2021.12'
/cvmfs/pilot.eessi-hpc.org/versions/${EESSI_PILOT_VERSION}/compat/linux/$(uname -m)/startprefix
git clone https://github.com/huebner-m/software-layer.git
cd software-layer
git checkout add_gpu_support
cd gpu_support
export INSTALL_WO_GPU=true
./add_nvidia_support.sh

@huebner-m
Copy link
Contributor Author

The observed issue seems to be related to EESSI/filesystem-layer#110.

Local tests worked out when setting CVMFS_HIDE_MAGIC_XATTRS=yes in default.local. I consider this issue resolved with this workaround.

TopRichard added a commit to TopRichard/bot-software-layer1 that referenced this issue Oct 9, 2023
…/2023.1-foss/2022a

{2023.06}[foss/2022a] GROMACS V2023.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants