Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up application startup by using ld.so.cache #207061

Open
wants to merge 2 commits into
base: staging
Choose a base branch
from

Conversation

winterqt
Copy link
Member

@winterqt winterqt commented Dec 21, 2022

Description of changes

This PR does two things:

  1. Patch glibc's dynamic loader to load the ld.so cache from $ORIGIN/../etc/ld.so.cache instead of /etc/ld.so.cache, and to prefer the cache over RPATH entries
  2. Introduce a new setup hook, generate-ld-cache, to facilitate the generation of these caches

This was originally implemented by Guix's Ludovic Courtès in August of 2021 (after Ricardo came up with the idea) -- see their blog post and their discussion on the change for more information + some benchmarks. The glibc patch is copied verbatim from the Guix codebase.

As noted in the linked Guix issue, there are a few things to consider here:

  • The hard-coded $ORIGIN/../etc/ld.so.cache path means that it can only be used with first-level sub-directories like bin and sbin
  • There is a possibility that users could be tricked into loading a malicious ld.so.cache
  • If you read the discussion, you'll note that I omitted one of their points here: that transient dependencies aren't added to the cache. Our implementation actually does include transient dependencies, because we use ldd to get the required dependencies, as opposed to just using the data from the ELF.

When discussing this with pennae, they brought up the idea of embedding an absolute path to the cache in an ELF section, and adjusting the patch to do that. We may want to do this instead; I'd appreciate thoughts on the matter.

I'd like to thank the Guix folks mentioned above for originally coming up with and implementing this idea, as well as pennae for nerd sniping me into looking at perf stuff, and their thoughts on the matter.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: stdenv Standard environment label Dec 21, 2022
@winterqt winterqt changed the title glibc: load $ORIGIN/../etc/ld.so.cache when available Speed up application startup by using ld.so.cache Dec 21, 2022
@winterqt
Copy link
Member Author

After looking around some more, I stumbled across @fzakaria'a nix-harden-needed, which may be a better choice over Guix's solution (per the README, they want to integrate it into Nixpkgs).

(Farid: hi, figured I'd loop you in here, considering you have interest in this problem :)

@fzakaria
Copy link
Contributor

fzakaria commented Dec 21, 2022

After looking around some more, I stumbled across @fzakaria'a nix-harden-needed, which may be a better choice over Guix's solution (per the README, they want to integrate it into Nixpkgs).

(Farid: hi, figured I'd loop you in here, considering you have interest in this problem :)

Hi Winter!

Thank you for pinging me.
This is something I would love to solve at nixpkgs level -- I've just been intimated doing the solution although I've audited the code and have a good sense of what to do.

My hot-take:
1 - we can take https://github.com/fzakaria/shrinkwrap and rewrite it in C (it's using LIEF C-library) so the closure is very small and we can include it in the bootstrap binary.
With shrinkwrap we can take a much simpler "patchelf" style approach and fixup the binaries.
2 - overtime do harden-needed approach to make it more "robust"
3 - I really want to investigate using $ORIGIN in the path so that the binaries are relocatable for any nix-store path.
(This is something Spack package manager accomplishes already)
(We actually need to upstream $ORIGIN support for PT_INTERP)

Edit: This is a great PR btw -- very concise if it works. Only unfortunate thing is patching but maybe you can upstream something to make it configurable based on an ELF file section.

It has the added benefit of supporting LD_LIBRARY_PATH which the other options don't (maybe though that's a feature?)

@fzakaria
Copy link
Contributor

Here is a document of some other much more "visionary" ideas I have for Nix + NixOS.

Breaking_Apart_the_Shared_Object_Monolith.pdf

@winterqt
Copy link
Member Author

So I hope you don't mind me asking this here (if you'd rather discuss over e.g. IRC, Matrix, or email to avoid cluttering up this PR, let me know), but what does Shrinkwrap provide that this cache approach doesn't? They seemingly both eliminate the intensive RPATH traversal.

I know there's one advantage you list in the README:

Secondly, shared dynamic objects may be found due to the fact that they are cached during the linking stage. Meaning, if another shared object requires the same dependency but failed to specify where to find it, it may still properly resolved if discovered earlier in the linking process. This is extremely error prone and changing any of the executable's dependencies can change the link order and potentially cause the binary to no longer work.

but I admit I'm confused as to what a practical example of this would look like/what the problem is in general 😅. Do you mind explaining?

@winterqt
Copy link
Member Author

winterqt commented Dec 21, 2022

Edit: This is a great PR btw -- very concise if it works. Only unfortunate thing is patching but maybe you can upstream something to make it configurable based on an ELF file section.

It does work; I've tested it with hello, and any Guix user has been using it since last year. Not sure how much credit I can take, since it's just some Bash + the Guix folks' patch, but thank you :)

I think an advantage to this solution is definitely that it's minimally invasive and concise, as you mention.

It has the added benefit of supporting LD_LIBRARY_PATH which the other options don't (maybe though that's a feature?)

(Why doesn't Shrinkwrap support it, since it just lifts dependencies up to the parent ELF?)

@fzakaria
Copy link
Contributor

fzakaria commented Dec 21, 2022

Reached out on Matrix.
(@winterqt:nixos.dev)

@Mindavi
Copy link
Contributor

Mindavi commented Dec 21, 2022

Some questions:

  • Does this work when cross-compiling?
  • Are there any implications for content-addressed derivations with this?

@@ -71,7 +71,8 @@ let
../../build-support/setup-hooks/reproducible-builds.sh
../../build-support/setup-hooks/set-source-date-epoch-to-latest.sh
../../build-support/setup-hooks/strip.sh
] ++ lib.optionals hasCC [ cc ];
] ++ lib.optionals hostPlatform.isLinux [ ../../build-support/setup-hooks/generate-ld-cache.sh ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostPlatform.isLinux is no specific enough, for example pkgsMusl doesn't have a ldconfig. Maybe hostPlatform.isLinux && hostPlatform.isGnu?

generating LD cache for ELF executables in /nix/store/bgm2nivmwfa94pw6xlial071z9
0vv5pg-nano-7.1
found lib: /nix/store/62hdd7jfmj90wrwl6dp94hj6xzl9cgvq-ncurses-6.3-p20220507/lib
/libncursesw.so.6
found lib: /nix/store/mpgmlwjdd9g61w92q5d9g8y7r66fp8j6-musl-1.2.3/lib/ld-musl-x8
6_64.so.1
found lib dirs: /nix/store/62hdd7jfmj90wrwl6dp94hj6xzl9cgvq-ncurses-6.3-p2022050
7/lib /nix/store/mpgmlwjdd9g61w92q5d9g8y7r66fp8j6-musl-1.2.3/lib
/nix/store/6j095nxqj7qylwbbp4m00kb640gca06x-generate-ld-cache.sh: line 41: ldcon
fig: command not found
failed to generate LD cache

(nix-build -A pkgsMusl.nano)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point this only works for Glibc

@pennae
Copy link
Contributor

pennae commented Dec 21, 2022

building stdenv comes up with some errors (that are probably not fatal only because shell is terrible):

gnum4> /nix/store/6j095nxqj7qylwbbp4m00kb640gca06x-generate-ld-cache.sh: line 5: ldd: command not found
xz> /nix/store/6j095nxqj7qylwbbp4m00kb640gca06x-generate-ld-cache.sh: line 5: ldd: command not found
zlib> /nix/store/6j095nxqj7qylwbbp4m00kb640gca06x-generate-ld-cache.sh: line 5: ldd: command not found
gettext> ldd: $warning: you do not have execution permission for `/nix/store/w8dzi3li1zx3dnpsilp3s9wah18g0f2b-gettext-0.21/lib/preloadable_libintl.so'
# etc

might be better to add this hook to the final linux stdenv stage only.

cross-building an armv7 test system from scratch fails at python:

error: builder for '/nix/store/2wx96x4ci01yw9p2wi3ad1kg78wijnp1-python3-3.10.9-env.drv' failed with exit code 1;
       last 1 log lines:
       > error: collision between `/nix/store/lc038xi6in0x4i11ypm4dcbhg1lla2ny-dtc-1.6.1/etc/ld.so.cache' and `/nix/store/yyvlli12hj3i442ww29pdcbmvdg2kh0k-python3-3.10.9/etc/ld.so.cache'
       For full logs, run 'nix log /nix/store/2wx96x4ci01yw9p2wi3ad1kg78wijnp1-python3-3.10.9-env.drv'.

buildEnv (and presumably symlinkJoin and friends) don't seem to like the approach :(

edit
cross-building does not look like it's going so well:

nix> /nix/store/6j095nxqj7qylwbbp4m00kb640gca06x-generate-ld-cache.sh: line 41: /nix/store/1ff0pbnn4jx6amdf9jxn0y0i39v02ngm-glibc-armv7l-unknown-linux-gnueabihf-2.35-224-bin/bin/ldconfig: cannot execute binary file: Exec format error

@winterqt
Copy link
Member Author

buildEnv (and presumably symlinkJoin and friends) don't seem to like the approach :(

Adding etc/ldconfig.so.cache to this block will fix buildEnv:
https://github.com/NixOS/nixpkgs/blob/339ff0387aa51495586b69c9052e573465126bcb/pkgs/build-support/buildenv/builder.pl#L125-133

Not sure what to do about symlinkJoin, though; maybe it doesn't suffer from this issue (or at least won't fail to build)? Will test ASAP.


fixupOutputHooks+=('if [ -z "${dontGenerateLDCache-}" ]; then generateLDCache "$prefix"; fi')

generateLDCache() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use patchelf to list the dependencies rather than this bash ?
(does patchelf list the needed transitive? if not, you can write a tiny C binary using https://github.com/lief-project/LIEF)

Might be more maintainable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've done some investigating and it looks like a purpose-built tool that crawls a dep tree and generates a cache file will be the best thing to do. ld and ldconfig apparently can't be made to work in cross-builds (ldd certainly not, for ldconfig we've found nothing doing it with glibc. only one instance with uclibc, which doesn't want to build for armv7 right now (and that doesn't bode well for using its ldconfig))

would very much like to make this rust though, C ought to disappear and rustc is in our closures anyways

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/reducing-stat-calls-for-library-loading-during-application-startup/24358/1

@tomberek
Copy link
Contributor

tomberek commented Jan 2, 2023

There was a similar patch and discussion from NixOS/patchelf#357 (comment) that started to fold this into the patchelf phase. I'm a fan of the ld.so.cache approach due to the possibility of providing a different cache en-masse, or to mutable locations for testing/debugging!) (but the shrinkwrap also has merits.

Looks like the ldd searching for transitive deps is the main difference. I fear this makes the approach more fragile. What are the pros/cons here?

When discussing this with pennae, they brought up the idea of embedding an absolute path to the cache in an ELF section, and adjusting the patch to do that. We may want to do this instead; I'd appreciate thoughts on the matter.

This would make it easier to then patch or redirect to a new place later. I can imagine a patchelf --set-ld-so-cache /some-project/etc/ld.so.cache or as a more focused alternative to lib.replaceDependency:

replaceLibDependency drv {
  drv = drv;
  from = openssl_old;
  to = openssl_new;
}

+ static const char store[] = @STORE_DIRECTORY@;
+ const char *origin = _dl_get_origin ();
+
+ /* Check whether ORIGIN is something like "/gnu/store/…-foo/bin". */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ /* Check whether ORIGIN is something like "/gnu/store/…-foo/bin". */
+ /* Check whether ORIGIN is something like "/nix/store/…-foo/bin". */

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-01-03-nixpkgs-architecture-team-meeting-23/24404/1

@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Apr 19, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/where-did-you-get-stuck-in-the-nix-ecosystem-tell-me-your-story/31415/2

@nyabinary
Copy link
Contributor

What the status on this currently?

@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict This PR has merge conflicts with the target branch labels Mar 19, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/where-did-you-get-stuck-in-the-nix-ecosystem-tell-me-your-story/31415/72

@diniamo
Copy link
Contributor

diniamo commented Sep 17, 2024

I don't understand, why aren't fzakaria's solutions plain better, compared to a cache?

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 17, 2024
@fzakaria
Copy link
Contributor

I have some interesting work in similar vein for relocation processing that is a clear win for Nixpkgs as well; I plan to speak + write about it.

@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: stdenv Standard environment 10.rebuild-darwin: 11-100 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+ 10.rebuild-linux-stdenv This PR causes stdenv to rebuild significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.