Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libnvidia-container: include binaries from driver package #372320

Merged

Conversation

msanft
Copy link
Contributor

@msanft msanft commented Jan 9, 2025

There's really no reason to require the binaries from the nvidia driver package to be passed to libnvidia-container via the PATH. We do the same thing via driverLink for the libraries already, so let's be consistent here.

This also fixes a segfault in the patch I wrote in #366855, which causes uninitialized memory to be written if binaries are mounted into the container. Thus, I scrap the complicated patch entirely here, going back to just including the binaries from the driver package and not via PATH.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

There's really no reason to require the binaries from the nvidia driver package to be passed to libnvidia-container via the PATH. We do the same thing via driverLink for the libraries already, so let's be consistent here.
@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Jan 9, 2025
@nix-owners nix-owners bot requested a review from cpcloud January 9, 2025 09:32
@msanft
Copy link
Contributor Author

msanft commented Jan 9, 2025

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 372320


x86_64-linux

✅ 10 packages built:
  • apptainer
  • apptainer-overriden-nixos
  • libnvidia-container
  • nvidia-container-toolkit
  • nvidia-container-toolkit.tools
  • nvidia-docker
  • singularity
  • singularity-overriden-nixos
  • udocker
  • udocker.dist

@msanft msanft force-pushed the msanft/libnvidia-container/driver-bins branch 3 times, most recently from a831dbe to 07db7b5 Compare January 9, 2025 13:27
@msanft msanft marked this pull request as draft January 9, 2025 13:27
@msanft msanft marked this pull request as ready for review January 9, 2025 13:46
@msanft
Copy link
Contributor Author

msanft commented Jan 9, 2025

Tested this to work, also in conjunction with #372148

NixOS@24d9b8a
introduced a bug, that, if `PATH` was actually set to a value
when libnvidia-container was ran, caused a segfault.
This fixes this bug by scrapping the dynamic `PATH` behavior in favor
of simply using the driver package's `driverLink` abstraction for that,
which is more fit for the purpose than `PATH` either way.
@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Jan 9, 2025
@ConnorBaker ConnorBaker added the 6.topic: cuda Parallel computing platform and API label Jan 14, 2025
Comment on lines +20 to 25
+ // TODO: Remove this patch once `virtualisation.docker.enableNvidia` is removed from NixOS.
+ // It only exists to maintain compatibility with the old nvidia-docker package.
+ if ((env = "/run/nvidia-docker/bin:/run/nvidia-docker/extras/bin:@driverLink@/bin") == NULL) {
error_setx(err, "environment variable PATH not found");
return (-1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you wanted to keep the if statement for some reason, I'm okay with this being removed entirely.

@ConnorBaker ConnorBaker merged commit 2f6a15c into NixOS:master Jan 14, 2025
34 of 37 checks passed
+ if ((os_path = secure_getenv("PATH")) == NULL) {
+ // TODO: Remove this patch once `virtualisation.docker.enableNvidia` is removed from NixOS.
+ // It only exists to maintain compatibility with the old nvidia-docker package.
+ if ((env = "/run/nvidia-docker/bin:/run/nvidia-docker/extras/bin:@driverLink@/bin") == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do late-binding for nixos (/run paths) but not other systems (e.g. via PATH), and why not just always use the early-bound @driverLink@/bin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, /run/nvidia-docker/bin are remnants of virtualisation.docker.enableNvidia. This patch goes back to always using that, while adding @driverLink@/bin. I had added additional support for PATH before (#366855), but things went south there as I introduced a bug. Looking back at that, I decided to go back to a static path, which I did in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants