Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Support for Symlinks in tar Rule's Runfiles Handling #1036

Merged
merged 9 commits into from
Jan 31, 2025

Conversation

ewianda
Copy link
Contributor

@ewianda ewianda commented Jan 20, 2025

This Pull Request (PR) enhances the tar rule to better handle symlinks within runfiles, improving compatibility with various language ecosystems

Currently, symlinks within runfiles are not correctly reflected in the mtree_spec generated by Bazel, causing issues in packaging, especially for symlink-sensitive applications like Node.js with pnpm.


Changes

Symlink Detection and Resolution

  • Logic Added: Detect if a file in the runfiles is a symlink using readlink.
  • Runtime Resolution: Resolves symlinks post-build to ensure the mtree_spec accurately represents symlink structures.

MTree Mutation

  • Modification Script: Utilized awk scripting to adjust the mtree_spec after it is generated.
    • Reads each line of the mtree_spec.
    • Verifies if entries reported as files are actually symlinks using readlink -f.
    • Updates entries to show type=link and the correct link= path when applicable.

closes #603

tools/mtree/main.go Outdated Show resolved Hide resolved
@ewianda ewianda force-pushed the handle-symlinks branch 2 times, most recently from 8a5d7fc to e87fad1 Compare January 20, 2025 23:10
@thesayyn
Copy link
Collaborator

Looking great already.

@ewianda ewianda changed the title Handle symlinks Add Support for Symlinks in tar Rule's Runfiles Handling Jan 22, 2025
@ewianda ewianda force-pushed the handle-symlinks branch 3 times, most recently from 8421e11 to 95f66d4 Compare January 22, 2025 03:18
@ewianda ewianda changed the title Add Support for Symlinks in tar Rule's Runfiles Handling feat: add Support for Symlinks in tar Rule's Runfiles Handling Jan 22, 2025
@ewianda ewianda marked this pull request as ready for review January 22, 2025 03:21
@thesayyn
Copy link
Collaborator

You have some buildifier errors.

@ewianda
Copy link
Contributor Author

ewianda commented Jan 23, 2025

You have some buildifier errors.

Should be all good now

@ewianda
Copy link
Contributor Author

ewianda commented Jan 24, 2025

Hi @thesayyn I am noticing two potential issues

  • For relatively large application for example our next build with about a 150K lines of type=file, it takes about 5 minutes to do readlink
  • Some symlinks are relative, and because of the way the srcs files are laid out, readlink -f fails and readlink works. This implies that if we have to handle both cases in theory that will in theory increase the build time for resolving symlinks

I have thought about a few things

  1. Us the file.is_symlink feature only available in bazel 8 to filter out symlink files making the preserve_symlink feature a bazel 8+ feature
  2. Have a separate action for evaluating symlink using bash to parallelize the process, something like cat mtree.spec | xargs -P

@thesayyn
Copy link
Collaborator

I like the numbers, yes, i was also curious of performance implications for something like this. 5mins for 150k is unpleasant.

Some symlinks are relative, and because of the way the srcs files are laid out, readlink -f fails and readlink works. This implies that if we have to handle both cases in theory that will in theory increase the build time for resolving symlinks

Oh, yes, there must be more to the logic that what we currently have here; See: https://github.com/aspect-build/rules_js/blob/main/js/private/image/index.ts#L84 it's been in use for a while and covers some edge cases.

Us the file.is_symlink feature only available in bazel 8 to filter out symlink files making the preserve_symlink feature a bazel 8+ feature

That is possible, yes, but i don't it will report correctly for things declared as file but is symlink, eg treeartifact containing a symlink. there is no way to know about symlinks inside treeartifacts.

Have a separate action for evaluating symlink using bash to parallelize the process

I have little to no faith that it would be faster than what we right now, since which readlink is /usr/bin/readlink so it will likely call out to external binary and suffer from the same problem as awk.

My current thinking is that, while writing a go tool for this is convenient, its probably overkill, and will fall short if people need to customize it, i'd like to use that effort to make a proper awk setup.

https://github.com/thesayyn/prebuilt_toolchains has an AWK recipe for Bazel, i am planning to make a toolchain for it.

AWK has this concept of extensions https://www.gnu.org/software/gawk/manual/html_node/gawkextlib.html which we can use to introduce a new readlink function to get near native performance within awk, so instead of system function, we'd use readlink function which will just make a syscall within the same process.

References:

https://sourceforge.net/projects/gawkextlib/
https://www.gnu.org/software/gawk/manual/html_node/gawkextlib.html

@thesayyn
Copy link
Collaborator

All that said, i don't mind landing this as is, since its behind a flag, (which we should mark as experimental BTW), and get some feedback and interest before investing more into this.

@ewianda
Copy link
Contributor Author

ewianda commented Jan 24, 2025

Fair enough,
I will like to add the fallback of readlink -f to readlink to address the issue that I had with rules_python

@thesayyn
Copy link
Collaborator

thesayyn commented Jan 24, 2025

You need to update the docs because of the new flag.

bazel run @@//docs:update

@ewianda
Copy link
Contributor Author

ewianda commented Jan 24, 2025

Is experimental_preserve_symlink good enough at the macro level

@thesayyn
Copy link
Collaborator

No need for the experimental prefix, we can just put a EXPERIMENTAL! We may remove or change it at any point without further notice. in the doc of the attribute.

@ewianda
Copy link
Contributor Author

ewianda commented Jan 28, 2025

hi @thesayyn //lib/tests/tar:test_17_after_processing is failing for bazel 7.4.1 and I can't seem to be able to reproduce it locally

lib/tests/tar/BUILD.bazel Outdated Show resolved Hide resolved
@ewianda
Copy link
Contributor Author

ewianda commented Jan 29, 2025

@thesayyn, I added realpath to get the relative path.

From

AWK has this concept of extensions https://www.gnu.org/software/gawk/manual/html_node/gawkextlib.html which we can use to introduce a new readlink function to get near native performance within awk, so instead of system function, we'd use readlink function which will just make a syscall within the same process.

Will be possible to replace realpath as well

lib/tests/tar/BUILD.bazel Outdated Show resolved Hide resolved
@ewianda
Copy link
Contributor Author

ewianda commented Jan 31, 2025

@thesayyn made the suggested change

@thesayyn
Copy link
Collaborator

Hmm, the test is failing due to mtime, https://github.com/bazel-contrib/bazel-lib/actions/runs/13054891062/job/36493038666?pr=1036#step:6:252

Can you set it to a static value in your tests?

@ewianda
Copy link
Contributor Author

ewianda commented Jan 31, 2025

Hmm, the test is failing due to mtime, https://github.com/bazel-contrib/bazel-lib/actions/runs/13054891062/job/36493038666?pr=1036#step:6:252

Can you set it to a static value in your tests?

Just fixed it

@thesayyn thesayyn merged commit 219b907 into bazel-contrib:main Jan 31, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR]: tar rule should support runfiles with symlinks
2 participants