Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No option to preserve symlinks in pkg_tar #115

Open
colatkinson opened this issue Nov 10, 2019 · 38 comments
Open

No option to preserve symlinks in pkg_tar #115

colatkinson opened this issue Nov 10, 2019 · 38 comments
Labels
p4 An idea that we are not considering working on at this time.

Comments

@colatkinson
Copy link

The current behavior of pkg_tar is that if I have a file and a symlink to that file, and both are given as sources, the symlink is "expanded" and two copies of the original file are added to the repo.

For some context, my use case is that I'm trying to package an external toolchain's libc into a tar for use in Docker containers (cross-compiling is... fun). As you can imagine, this is massively ballooning the size of my images.

While I could certainly attempt to do something like filtering all the links out and then generating a symlinks dict to pass in, I was wondering if there would be interest in adding a flag to allow this behavior in the main rule. I think it could be useful in a variety of contexts, especially in generating library debs for use by others.

The behavior as I imagine it would be along the lines of passing a preserve_symlinks flag, which would then include symlinks if they only point to files in the archive. This validation should be simple enough, and would remove risks of non-hermeticity. Alternatively, symlinks pointing elsewhere could follow the current behavior, while hermetic symlinks could be preserved.

Please let me know if there are any concerns I'm missing, or if there's an obvious solution to my problem that I missed somewhere in the docs.

@aiuto aiuto added the P2 An issue that should be worked on when time is available label May 19, 2020
@aiuto
Copy link
Collaborator

aiuto commented May 19, 2020

Clearly a missing feature. I would like to see this in pkgfilegroup and reused across all rules.

@pudelkoM
Copy link

pudelkoM commented Jun 5, 2020

In case someone needs this and is fine with a workaround until this is properly fixed here, we wrote a Starlark wrapper that invokes pkg_tar with an appropriate flagfile to create the symlinks:

https://github.com/stratum/stratum/blob/3e8f0cdf1abdbdf5458b802058c8413084884420/bazel/rules/package_rule.bzl#L12-L47

@aiuto
Copy link
Collaborator

aiuto commented Aug 12, 2021

This should be doable now with pkg_mklink, but it won't do the auto-magic expand.
I think that feature is not that useful. In practice, what you usually want in your package is something like

  • a binary that is the result of a rule, and you want to place that somewhere like /usr/share/MyApp/bin/myapp
  • a symlink from /usr/bin/myapp to /usr/share/MyApp/bin/myapp

Bazel doesn't really deal with symlinks as a differentiable first-class file type that well, so it would be excessive engineering here to do that unless that was added.

@aiuto aiuto added p4 An idea that we are not considering working on at this time. and removed P2 An issue that should be worked on when time is available labels Aug 12, 2021
@bazaglia
Copy link

@aiuto I'm using aspect-build/rules_js, a new high performance alternative/better approach to build_bazel_rules_nodejs and it uses pnpm that heavily depends on symlinks. Unfortunately in Node.js it's not about building just a binary as you mentioned, there is a node_modules folder structure where the symlinks need to be preserved otherwise the modules resolution won't work. I think that feature is pretty useful.

@aiuto
Copy link
Collaborator

aiuto commented May 28, 2022

Can you provide a detailed writeup (preferablye in an external doc) about what rules_js does, and then what you need to accomplish. I won't have the time to do this research myself so I have to rely on support from the community to gather requirements for inputs we need to accept and final package layouts we want to achieve. Once we get a good set of use cases, I'm sure I can come up with an implementation plan.

@alexeagle
Copy link
Collaborator

https://pnpm.io/symlinked-node-modules-structure is the detailed writeup about the semantics required. The tar file needs to contain a node_modules directory with symlinks into a CAS "virtual store" as described there.

So we just need exactly what the OP requests: preserve the symlinks in the tar file so that they survive the trip into a docker image and then into the container runtime.

@alexeagle
Copy link
Collaborator

@aiuto this is now a significant blocker for rules_nodejs users ready to migrate to rules_js. Was that reply sufficient for you to accept a PR to add the preserve_symlinks option?

@kurt-google
Copy link

Just to chime in, I would also appreciate this working in pkg_tar. The use case is similar, I want to preserve file structures for/from toolchains. In my particular case the toolchain produces many aliases of large libraries as symlinks (think .so .so.1 .so.1.0.0 etc). And when there are 10 aliases of a 100MB binary it is quite impactful.

@aiuto
Copy link
Collaborator

aiuto commented Jun 28, 2022 via email

@iamricard
Copy link

👋🏼 @aiuto wondering if you had a chance to pick up the thread. we're looking into moving to rules_js, and this is the one thing (that we know) is blocking us. thanks!

@aiuto
Copy link
Collaborator

aiuto commented Jul 19, 2022

OK. I am in a mood to deal with this now.

I presume we can have dangling, arbitrary symlinks, so that is not the problem. My hunch is that you can make the structure you want with pkg_files, but then it goes bad (where bad means messing up relative symlinks) in one of several possible places.

  • pkg_tar consuming the symlink tree
  • pkg_tar consuming tar via deps.
  • pkg_deb consuming pkg_tar (not likely, all we do is read it and blatz it back out)
  • docker rules consuming pkg_tar/pkg_deb. Again, probably not because the issue is here, not there.

While we are at it..... the spec uses <store> a lot. Is there a need for something like parameterized output path rewrite?

@aiuto
Copy link
Collaborator

aiuto commented Jul 20, 2022

Is #603 the right kind of test case?

@aiuto
Copy link
Collaborator

aiuto commented Jul 20, 2022

I need help understanding the requirement here. Please look at #603. It

  • creates a tree of symlnks that looks like node_modules
  • passes that as srcs input to pkg_tar
  • passes that tarball as deps input to a second pkg_tar

It still comes out right

tar tvf bazel-bin/tests/tar/relative_links_re_tar.tar
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/bar -> STORE/bar
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/qar -> ../../[email protected]/node_modules/qar
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/bar -> ../../[email protected]/node_modules/bar
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/foo -> STORE/foo
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/qar -> ../../[email protected]/node_modules/qar
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/[email protected]/node_modules/qar -> STORE/qar
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/foo -> .pnpm/[email protected]/node_modules/foo

What I think I am missing is how you build your input sources.

@aiuto
Copy link
Collaborator

aiuto commented Jul 20, 2022

Or..... maybe I am missing the obvious. Is the requirement:

Real check in files, or the result of a tree artifact.

  • bin/foo
  • foo -> bin/foo
  • bar -> ../../bar
pkg_tar(srcs=glob("[**]*), ...)

And that does not do the right thing for the two links?

... and the answer is yes. The current version of #603 flattens the symlinks to the content they point to, so I have a valid test case now. The question is how I want the API to look. What I really do not want is to implement in pkg_tar and then realize I needed it in pkg_files anyway.

@nacl
Copy link
Collaborator

nacl commented Jul 20, 2022

I was wondering something similar; it really depends on what rules_js is doing. The need to package everything up as a tar is also unclear to me.

In any case, it may not be possible (or convenient) to emit all the symlink providers.

@thesayyn
Copy link

thesayyn commented Jul 20, 2022

@aiuto findings are true. the underlying problem is that write_manifest constructs misleading manifests and build_tar blindly tries to build up the tar and fails.

image a rule like this

def incorrect_impl(ctx):
  actual_file = ctx.actions.declare_file("thefile.txt")
  symlink = ctx.actions.declare_file("this_is_a_symlink.txt")
  ctx.actions.symlink(symlink, target_file = actual_file)
  return DefaultInfo(files = depset([actual_file, symlink]))

pkg_tar will add two manifest entries;

  • file entry for thefile.txt
  • file entry for this_is_a_symlink.txt

but I don't think the problem is supposed to be fixed here in rules_pkg since there is no way pkg_tar could tell whether a file is a symlink or not.

the principled fix would be that bazel exposes two new fields to FileApi named is_symlink and symlink_target which points to another File

@thesayyn
Copy link

also, pkg_mklink is not helpful as there is a huge symlink forest for node_modules.

@aiuto
Copy link
Collaborator

aiuto commented Jul 20, 2022

the principled fix would be that bazel exposes two new fields to FileApi named is_symlink and symlink_target which points to another File

Yes. That really should happen. But that would be Bazel 7.x at this point, so I do want to come up with things I can do in the interim.

  • pkg_mklink is not so bad of the the forest can be built incrementally
  • we can probably do something in scanning tree artifacts to turn files into links
  • we can probably detect that input files are links or not, but that might not be compatible with remote build.

@thesayyn
Copy link

thesayyn commented Jul 20, 2022

I have already started on a fix that checks if any entry#[2] within the manifest is a symlink, if so then does a realpath on it then strips out execroot/wksp (to make it bazel-out/cfg/path/to) portion from it then does a lookup within the manifest to find the corresponding file. if found then rewrites the entry to be a symlink that points to the target file with its path in final tar.

I'd like to put up a PR for this to fix it in rules_pkg.

@thesayyn
Copy link

  • we can probably detect that input files are links or not, but that might not be compatible with remote build.

bazelbuild/bazel#15866 is working towards fixing symlinks with RBE

@aiuto aiuto changed the title No option to preserve relative symlinks in pkg_tar No option to preserve symlinks in pkg_tar Jul 22, 2022
@aiuto
Copy link
Collaborator

aiuto commented Jul 22, 2022

@thesayyn Can you take a look at #603 to confirm that //tests/tar:relative_links_tar reproduces the problem here.

If you were to patch it and do:

blaze build //tests/tar:relative_links_tar
tar tvf  bazel-bin/tests/tar/relative_links_tar.tar

you would see

$ tar tvf  bazel-bin/tests/tar/relative_links_tar.tar
-r-xr-xr-x 0/0            3993 1999-12-31 19:00 BUILD
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/
...
-r-xr-xr-x 0/0            3993 1999-12-31 19:00 outer_BUILD

This is correct for some people, and wrong for others. Since the native file is a symlink, they may have wanted outer_BUILD to simply be ../BUILD.

You say you are working on a "fix". I don't think a single fix is what we need. There has to be a way to control the behavior. And, I think it has to be in pkg_files rather than pkg_tar and pkg_zip individually. In some places we want to preserve links, but we might mix that with places where we follow them. Do you have a WIP PR?

@aiuto
Copy link
Collaborator

aiuto commented Jul 22, 2022

More thinking about this.

  • I believe that symlinks in tree artifacts should be preserved in pkg_tar or pkg_zip by default.
  • There is no need to implement an expand links option in pkg_tar or zip today.
  • We absolutely do need a way to instantiate or preserve links in source files and filegroup(). To make that most flexible, it should be at the pkg_files level rather than in pkg_zip.

@aiuto
Copy link
Collaborator

aiuto commented Jul 25, 2022

@thesayyn @alexeagle I don't want to wait for bazelbuild/bazel#15866 and bazel 6.x to make progress on this.

If you think the test cases in #603 cover enough of the space here, I will merge that PR as a baseline to work against and then continue a PR for this using those as a test case.

@iamricard
Copy link

sorry for not replying here after the ping! i honestly don't know enough to answer some of the questions, but maybe @alexeagle might?

@alexeagle
Copy link
Collaborator

@thesayyn is the most up-to-date. My impression since we landed https://github.com/aspect-build/rules_js/tree/main/e2e/js_image is that this issue is lower-severity.

@aiuto
Copy link
Collaborator

aiuto commented Aug 4, 2022

OK. I can still do some fixing on this now while I am trying to prep a 0.0.8 release.
If #603 has good test cases, I can use that as a basis for a PR that does something useful.

@joshwiden
Copy link

Hi all! I wanted to chime in to say that this issue also affects my team and we'd love to see a fix. I just don't want to see the momentum get lost after the recent comment saying that this issue may be lower-severity!

Our usage essentially boils down to:

pkg_files(
    name = "install_files",
    srcs = glob(
        include = [
            # some things with symlinks, such as:
            #   lib.so -> lib.so.6
            #   lib.so.6 -> lib.so.6.2.3
            #   lib.so.6.2.3
        ],
    ),
)

pkg_tar(
    name = "config",
    srcs = [
        ":install_files",
    ],
)

Thanks!

@aiuto
Copy link
Collaborator

aiuto commented Aug 10, 2022 via email

@thesayyn
Copy link

I need people to tell me if the test cases I have created in #603 (?)

Yes. Though this is only one part of the symlinks issue with pkg_tar. I have created #609 as well to demonstrate the issue with the node_module tree that rules_js lays out.

@aiuto
Copy link
Collaborator

aiuto commented Aug 30, 2022

OK. I'm unexpectedly tied up this week and can't do deep thinking but I can get people to review the old PRs so we can improve the baseline test cases.

@tcm-marcel
Copy link

We hit the same problem with the.so lib symlinks.
I quickly tried to patch rules_pkg to preserve symlinks, but it turned out to be harder than expected to differentiate manually created and sandbox-related symlinks. I will give it a second try at one point and let you know once I have a fix.

@thisgithappens
Copy link

Is there any update on this @aiuto? Would be nice if these symlinks were preserved. Even passing a pkg_mklink as a src to a pkg_tar or pkg_files doesn't preserve the link.

@aiuto
Copy link
Collaborator

aiuto commented Apr 16, 2024

I've not taken a look at this since 2022.

@thisgithappens
Copy link

What might be involved in getting a fix in for the test cases in #603? Seems like others confirmed those test cases were sufficient. I think this is still a valid feature/outcome of a package.

@peakschris
Copy link

peakschris commented May 23, 2024

This is also proving to be a blocker for us. We need to create docker containers that match our existing ones (from a different build system), which use symlinks. The pkg_tar used in our container builds flattens the links. We'd like links preserved.

The #603 test case matches our need

@alexeagle
Copy link
Collaborator

@thisgithappens @peakschris You might want to try https://github.com/aspect-build/bazel-lib/blob/main/docs/tar.md instead. It now has feature parity with strip_prefix and some other attributes of pkg_tar.

@peakschris
Copy link

@alexeagle that's an interesting idea, thanks for sharing. I think this would solve the issue for this particular archive, at the cost of bifurcating our archive rule usage.

I have thought several times about using aspect tar for all our archive needs. It looks like it has moved on since last time I looked! It would have the advantage of a more terse style than rules_pkg. However, I think we can't currently use aspect tar for all our archive needs, because:

  • we've written a bunch of custom rules to package files and their dependencies, based on rules_pkg manifest format
  • we need a single rules style to write all formats of archive; zip, tar.gz, etc
  • I can't see a way to combine multiple mtrees, which we would need to collect disjoint parts of our build into one archive (I'm sure this part would be easy for us to implement)

I'll continue to mull it over,

Thanks, Chris

@jortega0
Copy link

jortega0 commented Dec 4, 2024

For what it's worth, although pretty hacky, I put together this patch that looks for relative symlinks and at least resolves my use case for symlinking versioned .so files:

--- pkg/private/tar/build_tar.py	2024-11-22 13:48:24.028730145 -0500
+++ pkg/private/tar/build_tar.py	2024-11-22 13:48:18.793917324 -0500
@@ -301,6 +301,37 @@
             uname=names[0],
             gname=names[1])

+  def find_relative_symlink(self, filepath):
+    """
+    Recursively resolves symlinks and checks if any symlink points to a relative file.
+
+    Args:
+        filepath (str): The path to the file to start checking.
+
+    Returns:
+        str: The path to the symlink pointing to a relative file, or None if not found.
+    """
+    visited = set()
+
+    relative_link = None
+    while os.path.islink(filepath):
+        if filepath in visited:
+            # Break loops in case of circular symlinks
+            print("Detected circular symlink, breaking out of loop")
+            return None
+        visited.add(filepath)
+
+        target = os.readlink(filepath)
+        if os.path.isabs(target):
+            relative_link = None
+            filepath = target
+        else:
+            relative_link = target
+            break
+
+    return relative_link
+
+
   def add_manifest_entry(self, entry, file_attributes):
     # Use the pkg_tar mode/owner remapping as a fallback
     non_abs_path = entry.dest.strip('/')
@@ -322,8 +353,11 @@
         attrs['ids'] = (entry.uid, entry.gid)
       else:
         attrs['ids'] = (entry.uid, entry.uid)
+
     if entry.type == manifest.ENTRY_IS_LINK:
       self.add_link(entry.dest, entry.src, **attrs)
+    elif self.find_relative_symlink(entry.src):
+      self.add_link(entry.dest, self.find_relative_symlink(entry.src), **attrs)
     elif entry.type == manifest.ENTRY_IS_DIR:
       self.add_empty_dir(self.normalize_path(entry.dest), **attrs)
     elif entry.type == manifest.ENTRY_IS_TREE:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p4 An idea that we are not considering working on at this time.
Projects
None yet
Development

No branches or pull requests