Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Nvidia driver script to set recommendations for LD_PRELOAD #754

Merged
merged 33 commits into from
Jan 17, 2025

Conversation

ocaisa
Copy link
Member

@ocaisa ocaisa commented Sep 27, 2024

No description provided.

Copy link

eessi-bot bot commented Sep 27, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Sep 27, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

@ocaisa ocaisa marked this pull request as ready for review September 27, 2024 16:02
@ocaisa
Copy link
Member Author

ocaisa commented Sep 27, 2024

Example output:

[rocky@ip-172-31-27-81 software-layer]$  ./scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh --ld-preload --no-download
Found NVIDIA GPU driver version 545.23.08
Found host CUDA version 12.3
Using default list of libraries
Matched 48 CUDA Libraries

When attempting to use LD_PRELOAD we exclude anything related to graphics
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGL.so.1.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGL.so.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGLX_nvidia.so.0.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGLX.so.0.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGLX.so.
libwayland-server.so.0 is NOT in the provided  preload list, filtering /lib64/libnvidia-egl-wayland.so.1.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libnvidia-fbc.so.1.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libnvidia-fbc.so.
libXNVCtrl.so.0 is NOT in the provided  preload list, filtering /lib64/libnvidia-gtk3.so.545.23.08.

The recommended way to use LD_PRELOAD is to only use it when you need to:

export EESSI_GPU_LD_PRELOAD="/lib64/libcuda.so.1:/lib64/libcuda.so:/lib64/libcudadebugger.so.1:/lib64/libnvcuvid.so.1:/lib64/libnvcuvid.so:/lib64/libnvidia-cfg.so.1:/lib64/libnvidia-cfg.so:/lib64/libnvidia-eglcore.so.545.23.08:/lib64/libnvidia-encode.so.1:/lib64/libnvidia-encode.so:/lib64/libnvidia-glcore.so.545.23.08:/lib64/libnvidia-glsi.so.545.23.08:/lib64/libnvidia-glvkspirv.so.545.23.08:/lib64/libnvidia-gpucomp.so.545.23.08:/lib64/libnvidia-ml.so.1:/lib64/libnvidia-ml.so:/lib64/libnvidia-nvvm.so.4:/lib64/libnvidia-nvvm.so:/lib64/libnvidia-opencl.so.1:/lib64/libnvidia-opticalflow.so.1:/lib64/libnvidia-ptxjitcompiler.so.1:/lib64/libnvidia-ptxjitcompiler.so:/lib64/libnvidia-rtcore.so.545.23.08:/lib64/libnvidia-tls.so.545.23.08:/lib64/libnvoptix.so.1:/lib64/libOpenCL.so.1"
export EESSI_OVERRIDE_GPU_CHECK="1"

Then you can set LD_PRELOAD only when you want to run a GPU application, e.g.,
    LD_PRELOAD="$EESSI_GPU_LD_PRELOAD" device_query

@boegel boegel added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Oct 9, 2024
@boegel
Copy link
Contributor

boegel commented Oct 9, 2024

@ocaisa There's duplicate entries here, libcuda.so is a symlink for libcuda.so.1, only one is needed

# Filter out all symlinks and libraries that have missing library dependencies under EESSI
filtered_libraries=()
for library in "${matched_libraries[@]}"; do
if [ ! -L "$library" ]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too aggressive, instead we should just resolve the symlink and remove duplicate entries

@ocaisa
Copy link
Member Author

ocaisa commented Oct 17, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic

Copy link

eessi-bot bot commented Oct 17, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account ocaisa has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Oct 17, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Oct 17, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_754/23806

date job status comment
Oct 17 12:33:27 UTC 2024 submitted job id 23806 awaits release by job manager
Oct 17 12:33:30 UTC 2024 released job awaits launch by Slurm scheduler
Oct 17 12:34:35 UTC 2024 running job 23806 is running
Oct 17 12:40:51 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-23806.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1729168457.tar.gzsize: 0 MiB (4682 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Oct 17 12:40:51 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-generic-node+default
P: perf: 484.052 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-generic-node+default
P: perf: 507.606 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.5 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.3 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-generic-node+default
P: latency: 7.98 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-generic-node+default
P: latency: 7.91 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.62 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.64 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10600.45 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10212.21 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-23806.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@ocaisa ocaisa added the ready-to-deploy Mark a PR as ready to deploy label Oct 17, 2024
@ocaisa ocaisa changed the title Allow Nvidia driver script to set LD_PRELOAD Allow Nvidia driver script to set recommendations for LD_PRELOAD Oct 17, 2024
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh Outdated Show resolved Hide resolved
@TopRichard
Copy link
Collaborator

Also tested the script within eessi_container :

Found host CUDA version 9.0
Found NVIDIA GPU driver version 535.129.03
Using downloaded list of libraries
Matched 41 CUDA Libraries
The host GPU driver libraries (v535.129.03) have already been linked! (based on /cvmfs/software.eessi.io/host_injections/nvidia/aarch64/host/driver_version.txt)
Successfully created symlink between /cvmfs/software.eessi.io/host_injections/nvidia/aarch64/latest and lib in /cvmfs/software.eessi.io/host_injections/2023.06/compat/linux/aarch64
Host NVIDIA GPU drivers linked successfully for EESSI

Accepted all except one

Co-authored-by: TopRichard <[email protected]>
@ocaisa
Copy link
Member Author

ocaisa commented Nov 7, 2024

@TopRichard This will need to be re-tested now to make sure the changes haven't had an unintended impact

@ocaisa
Copy link
Member Author

ocaisa commented Jan 16, 2025

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic

Copy link

eessi-bot bot commented Jan 16, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

Copy link

eessi-bot bot commented Jan 16, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jan 16, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jan 16, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.01/pr_754/40795

date job status comment
Jan 16 13:57:47 UTC 2025 submitted job id 40795 awaits release by job manager
Jan 16 13:57:55 UTC 2025 released job awaits launch by Slurm scheduler
Jan 16 14:02:58 UTC 2025 running job 40795 is running
Jan 16 14:10:06 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-40795.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1737036231.tar.gzsize: 0 MiB (4715 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Jan 16 14:10:06 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86-64-generic-node+default
P: perf: 452.106 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86-64-generic-node+default
P: perf: 464.798 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.01 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.14 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-generic-node+default
P: latency: 7.81 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-generic-node+default
P: latency: 7.72 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.62 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.64 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10317.54 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10306.94 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-40795.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Jan 17 10:59:44 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-generic-1737036231.tar.gz to S3 bucket succeeded

@TopRichard
Copy link
Collaborator

TopRichard commented Jan 17, 2025

@TopRichard This will need to be re-tested now to make sure the changes haven't had an unintended impact

re-testing:

Apptainer> /cvmfs/software.eessi.io/versions/2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Found host CUDA version 9.0
Found NVIDIA GPU driver version 535.129.03
Using downloaded list of libraries
Matched 41 CUDA Libraries
Successfully created symlink between latest and host in /cvmfs/software.eessi.io/host_injections/nvidia/aarch64
Successfully created symlink between /cvmfs/software.eessi.io/host_injections/nvidia/aarch64/latest and lib in /cvmfs/software.eessi.io/host_injections/2023.06/compat/linux/aarch64
Host NVIDIA GPU drivers linked successfully for EESSI

@TopRichard TopRichard added bot:deploy Ask bot to deploy missing software installations to EESSI and removed bot:deploy Ask bot to deploy missing software installations to EESSI labels Jan 17, 2025
@bedroge bedroge added the bot:deploy Ask bot to deploy missing software installations to EESSI label Jan 17, 2025
@TopRichard TopRichard removed the ready-to-deploy Mark a PR as ready to deploy label Jan 17, 2025
boegel
boegel previously requested changes Jan 17, 2025
Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this has just been deployed, so this review comes way too late, but I had a bunch of draft remarks that I somehow never submitted.

Take it as input for a follow-up PR...

@@ -1,144 +1,416 @@
#!/bin/bash

# This script links host libraries related to GPU drivers to a location where
# they can be found by the EESSI linker
# they can be found by the EESSI linker (or sets LD_PRELOAD as an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't set $LD_PRELOAD, it prints how $EESSI_*LD_PRELOAD can be set


get_nvlib_list() {
local nvliblist_url="https://raw.githubusercontent.com/apptainer/apptainer/main/etc/nvliblist.conf"
local default_nvlib_list=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here mentioning how this list was compiled, and with which driver version this current list corresponds?

"tls_test_.so"
)

# Check if the function was called with the "default" argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-picking: please fix indent of the comment


# Check if curl failed (i.e., the content is empty)
if [ -z "$nvliblist_content" ]; then
# Failed to download nvliblist.conf, using default list instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print a warning here (via echo_yellow) that the download failed, and that we're doing a fallback?


# Check if umask allows global read
if [ "$umask_octal" -gt 022 ]; then
fatal_error "The current umask ($current_umask) does not allow global read permissions, you'll want everyone to be able to read the created directory."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a suggestion on how to fix this in the error message?

Also, can't we set $UMASK here to dance around this ourselves?

filtered_libraries=()
compat_filtered_libraries=()
for library in "${matched_libraries[@]}"; do
# Run ldd on the given binary and filter for "not found" libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not binary, but library

compat_filtered_libraries=()
for library in "${matched_libraries[@]}"; do
# Run ldd on the given binary and filter for "not found" libraries
not_found_libs=$(ldd "$library" 2>/dev/null | grep "not found" | awk '{print $1}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid relying on awk, it may not be there, cut is way more likely to be there

# Find the host ldconfig
host_ldconfig=$(get_host_ldconfig)
# Gather libraries on the host (_must_ be host ldconfig)
host_libraries=$($host_ldconfig -p | awk '{print $NF}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can, we should avoid relying on awk, it may not be installed, use cut instead?

# Check if it is missing an so dep under EESSI
if [[ -z "$not_found_libs" ]]; then
# Resolve any symlink
realpath_library=$(realpath "$library")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a small chance that realpath is not available, so perhaps we should check for this early and exit with an error if it's not available

# Resolve any symlink
realpath_library=$(realpath "$library")
if [[ ! " ${filtered_libraries[@]} " =~ " $realpath_library " ]]; then
filtered_libraries+=("$realpath_library")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line above assumes there's spaces?

Suggested change
filtered_libraries+=("$realpath_library")
filtered_libraries+=(" ${realpath_library} ")

@boegel
Copy link
Contributor

boegel commented Jan 17, 2025

@bedroge This was deployed, so PR should be merged too?

@bedroge
Copy link
Collaborator

bedroge commented Jan 17, 2025

@bedroge This was deployed, so PR should be merged too?

Yes, the tarball has been ingested.

@bedroge bedroge dismissed boegel’s stale review January 17, 2025 16:00

This has already been deployed, requested changes can be done in a follow-up PR.

@bedroge bedroge merged commit 525cb32 into EESSI:2023.06-software.eessi.io Jan 17, 2025
50 checks passed
Copy link

eessi-bot bot commented Jan 17, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.10/pr_754/23806', '/project/def-users/SHARED/jobs/2025.01/pr_754/40795'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.17

Copy link

eessi-bot bot commented Jan 17, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants