Skip to content

Commit

Permalink
Resolve addresses in ARGs passed to COPY (#20989)
Browse files Browse the repository at this point in the history
This MR will resolve addresses passed to COPY through an ARG.
This MR also add inference of bare files.

The existing Docker integration allows copying `package` targets into
the docker container, but requires specifying the anticipated output
path. This is confusing. This MR also add inference of bare files.

For example, the following now works:
```Dockerfile
FROM		python:3.9

ARG PEX_BIN="testprojects/src/python/hello/main:main"
ARG PEX_BIN_DOTTED_PATH="testprojects.src.python.native/main.pex"

COPY		${PEX_BIN} /app/pex_bin
COPY		$PEX_BIN_DOTTED_PATH /app/pex_var
COPY		testprojects.src.python.native/main.pex /app/pex_bin_dotted_path
```

This MR also add inference of bare files. I thought that the docs were
broken but I just missed that you needed a manual link but I already
implemented it so why not push.
```Dockerfile
FROM		python:3.9
COPY		testprojects/src/docker/file.txt /app/
```

closes #20718
  • Loading branch information
lilatomic authored Jun 15, 2024
1 parent e44697e commit ed3f23e
Show file tree
Hide file tree
Showing 12 changed files with 329 additions and 28 deletions.
15 changes: 13 additions & 2 deletions docs/docs/docker/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,12 +80,23 @@ The context is assembled as follows:

### Dependency inference support

When you `COPY` PEX binaries into your image, the dependency on the `pex_binary` target will be inferred, so you don't have to add that explicitly to the list of `dependencies` on your `docker_image` target.
When you `COPY` PEX binaries into your image, the dependency on the `pex_binary` target will be inferred, so you don't have to add that explicitly to the list of `dependencies` on your `docker_image` target. For example, the `pex_binary` target `src/python/helloworld/bin.pex` has the default `output_path` of `src.python.helloworld/bin.pex`. So, Pants can infer a dependency based on the line `COPY src.python.helloworld/bin.pex /bin/helloworld`. This inference is also done for targets referenced by their target address in build arguments, for example:

For example, the `pex_binary` target `src/python/helloworld/bin.pex` has the default `output_path` of `src.python.helloworld/bin.pex`. So, Pants can infer a dependency based on the line `COPY src.python.helloworld/bin.pex /bin/helloworld`.
```dockerfile
FROM python:3.9
ARG PEX_BIN=src:my_target
COPY $PEX_BIN /app/my_app
```

Inference for Go binaries and artifacts of other packaged targets is similar.

Inference on `file`/`files` targets is also done on files, for example:

```dockerfile
FROM python:3.9
COPY src/file.txt /app/
```

Inference is also supported for `docker_image` targets specified in build arguments, for example:

```dockerfile
Expand Down
4 changes: 4 additions & 0 deletions docs/notes/2.23.x.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ for inspiration. To opt into the feature set the flag

### Backends

#### Docker

Docker inference is improved. Pants can now make inferences by target address for targets supporting `pants package`, and `file` targets can be included by filename. See the [documentation on Docker dependency inference](https://www.pantsbuild.org/2.23/docs/docker#dependency-inference-support) for details

#### Scala

Source files no longer produce a dependency on Scala plugins. If you are using a Scala plugin that is also required by the source code (such as acyclic), please add an explicit dependency or set the `packages` field on the artifact.
Expand Down
2 changes: 2 additions & 0 deletions src/python/pants/backend/docker/goals/package_image_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ def assert_build(
process_assertions: Callable[[Process], None] | None = None,
exit_code: int = 0,
copy_sources: tuple[str, ...] = (),
copy_build_args=(),
build_context_snapshot: Snapshot = EMPTY_SNAPSHOT,
version_tags: tuple[str, ...] = (),
plugin_tags: tuple[str, ...] = (),
Expand All @@ -122,6 +123,7 @@ def build_context_mock(request: DockerBuildContextRequest) -> DockerBuildContext
digest=EMPTY_DIGEST,
source=os.path.join(address.spec_path, "Dockerfile"),
copy_source_paths=copy_sources,
copy_build_args=copy_build_args,
version_tags=version_tags,
),
build_args=rule_runner.request(DockerBuildArgs, [DockerBuildArgsRequest(tgt)]),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ class DockerfileInfo:
source: str
build_args: DockerBuildArgs = DockerBuildArgs()
copy_source_paths: tuple[str, ...] = ()
copy_build_args: DockerBuildArgs = DockerBuildArgs()
from_image_build_args: DockerBuildArgs = DockerBuildArgs()
version_tags: tuple[str, ...] = ()

Expand Down Expand Up @@ -167,6 +168,9 @@ async def parse_dockerfile(request: DockerfileInfoRequest) -> DockerfileInfo:
*info["build_args"], duplicates_must_match=True
),
copy_source_paths=tuple(info["copy_source_paths"]),
copy_build_args=DockerBuildArgs.from_strings(
*info["copy_build_args"], duplicates_must_match=True
),
from_image_build_args=DockerBuildArgs.from_strings(
*info["from_image_build_args"], duplicates_must_match=True
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,16 @@ def test_parsed_injectables(files: list[tuple[str, str]], rule_runner: RuleRunne
dockerfile_content = dedent(
"""\
ARG BASE_IMAGE=:base
ARG PEX_BIN=:hello
ARG PEX_BIN_DOTTED_PATH=dotted.path.as.arg/dpaa.pex
ARG OTHER_FILE=other/file
ARG NO_DEFAULT
FROM $BASE_IMAGE
COPY some.target/binary.pex some.target/tool.pex /bin
COPY --from=scratch this.is/ignored.pex /opt
COPY binary another/cli.pex tool /bin
COPY $NO_DEFAULT /app/not_references
COPY $OTHER_FILE ${PEX_BIN} ${PEX_BIN_DOTTED_PATH} :technically_a_file /app/hello.pex
"""
)

Expand All @@ -73,12 +79,28 @@ def test_parsed_injectables(files: list[tuple[str, str]], rule_runner: RuleRunne
addr = Address("test")
info = rule_runner.request(DockerfileInfo, [DockerfileInfoRequest(addr)])
assert info.from_image_build_args.to_dict() == {"BASE_IMAGE": ":base"}

# copy args
docker_copy_build_args = info.copy_build_args.to_dict()
assert docker_copy_build_args == {
"PEX_BIN": ":hello",
"PEX_BIN_DOTTED_PATH": "dotted.path.as.arg/dpaa.pex",
"OTHER_FILE": "other/file",
}
assert (
"NO_DEFAULT" not in docker_copy_build_args
), "ARG with no value should not be included as it cannot be a reference to a target"
assert (
docker_copy_build_args.get("OTHER_FILE") == "other/file"
), "A file reference should still be copied even if it isn't an output path or an obvious target"

assert info.copy_source_paths == (
"some.target/binary.pex",
"some.target/tool.pex",
"binary",
"another/cli.pex",
"tool",
":technically_a_file", # we don't resolve inline targets, since we'd need to rewrite the Dockerfile
)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class ParsedDockerfileInfo:
source: str
build_args: tuple[str, ...] # "ARG_NAME=VALUE", ...
copy_source_paths: tuple[str, ...]
copy_build_args: tuple[str, ...] # "ARG_NAME=UPSTREAM_TARGET_ADDRESS", ...
from_image_build_args: tuple[str, ...] # "ARG_NAME=UPSTREAM_TARGET_ADDRESS", ...
version_tags: tuple[str, ...] # "STAGE TAG", ...

Expand Down Expand Up @@ -75,6 +76,11 @@ def main(*dockerfile_names: str) -> Iterator[ParsedDockerfileInfo]:
# import here to allow the rest of the file to be tested without a dependency on dockerfile
from dockerfile import Command, parse_file, parse_string # pants: no-infer-dep

@dataclass(frozen=True)
class CopyReferences:
in_arg: tuple[str, ...]
not_in_arg: tuple[str, ...]

@dataclass(frozen=True)
class ParsedDockerfile:
filename: str
Expand All @@ -93,6 +99,7 @@ def get_info(self) -> ParsedDockerfileInfo:
source=self.filename,
build_args=self.build_args(),
copy_source_paths=self.copy_source_paths(),
copy_build_args=self.copy_build_args(),
from_image_build_args=self.from_image_build_args(),
version_tags=self.baseimage_tags(),
)
Expand All @@ -102,14 +109,23 @@ def get_all(self, command_name: str) -> Iterator[Command]:
if command.cmd.upper() == command_name:
yield command

def from_image_build_args(self) -> tuple[str, ...]:
def arg_references(self):
"""Return ARGs which could have valid references."""
build_args = {
key: value.strip("\"'")
for key, has_value, value in [
build_arg.partition("=") for build_arg in self.build_args()
]
if has_value and valid_address(value)
if has_value
}
return build_args

def args_with_addresses(self):
"""All ARGs which have an Address as a value."""
return {k: v for k, v in self.arg_references().items() if valid_address(v)}

def from_image_build_args(self) -> tuple[str, ...]:
build_args = self.args_with_addresses()

return tuple(
f"{image_build_arg}={build_args[image_build_arg]}"
Expand All @@ -118,7 +134,7 @@ def from_image_build_args(self) -> tuple[str, ...]:
)

@staticmethod
def _get_image_ref_build_arg(image_ref: str) -> str | None:
def _extract_ref_from_arg(image_ref: str) -> str | None:
build_arg = re.match(r"\$\{?([a-zA-Z0-9_]+)\}?$", image_ref)
return build_arg.group(1) if build_arg else None

Expand All @@ -131,7 +147,7 @@ def from_image_build_arg_names(self) -> Iterator[str]:
FROM ${BASE_IMAGE}
"""
for cmd in self.get_all("FROM"):
build_arg = self._get_image_ref_build_arg(cmd.value[0])
build_arg = self._extract_ref_from_arg(cmd.value[0])
if build_arg:
yield build_arg

Expand Down Expand Up @@ -171,7 +187,7 @@ def _get_tag(image_ref: str) -> str | None:
"""The image ref is in the form `registry/repo/name[/...][:tag][@digest]` and where
`digest` is `sha256:hex value`, or a build arg reference with $ARG."""
if image_ref.startswith("$"):
build_arg = self._get_image_ref_build_arg(image_ref)
build_arg = self._extract_ref_from_arg(image_ref)
if build_arg:
return f"build-arg:{build_arg}"
parsed = re.match(_image_ref_regexp, image_ref)
Expand All @@ -195,10 +211,11 @@ def build_args(self) -> tuple[str, ...]:
"""Return all defined build args, including any default values."""
return tuple(cmd.original[4:].strip() for cmd in self.get_all("ARG"))

def copy_source_paths(self) -> tuple[str, ...]:
"""Return all files referenced from the build context using COPY instruction."""
def get_copy_references(self) -> CopyReferences:
"""Get all references (files and addresses) of COPY instructions, partitioned by whether
the appear in ARGs or not."""
# Exclude COPY --from instructions, as they don't refer to files from the build context.
return tuple(
copied_files = tuple(
chain(
*(
cmd.value[:-1]
Expand All @@ -207,6 +224,35 @@ def copy_source_paths(self) -> tuple[str, ...]:
)
)
)
arg_references = self.arg_references()

copy_in_arg = []
copy_not_in_arg = []
for f in copied_files:
argref = self._extract_ref_from_arg(f)
if argref:
if argref not in arg_references:
# if a COPYed file is referenced in an arg but the arg has no value,
# we can't resolve it in any way, so we omit it
continue
constructed_arg = f"{argref}={arg_references[argref]}"
copy_in_arg.append(constructed_arg)
else:
copy_not_in_arg.append(f)

return CopyReferences(tuple(copy_in_arg), tuple(copy_not_in_arg))

def copy_source_paths(self) -> tuple[str, ...]:
"""All files referenced from the build context using COPY instruction.
Does not include ones from ARGs
"""
return self.get_copy_references().not_in_arg

def copy_build_args(self) -> tuple[str, ...]:
"""All files and targets referenced from the build context in ARGs which are used by a
COPY instruction."""
return self.get_copy_references().in_arg

for parsed in map(ParsedDockerfile.from_file, dockerfile_names):
yield parsed.get_info()
Expand Down
54 changes: 48 additions & 6 deletions src/python/pants/backend/docker/util_rules/dependencies.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
DockerBuildArgs,
DockerBuildArgsRequest,
)
from pants.base.glob_match_error_behavior import GlobMatchErrorBehavior
from pants.base.specs import FileLiteralSpec, RawSpecs
from pants.core.goals.package import AllPackageableTargets, OutputPathField
from pants.engine.addresses import Addresses, UnparsedAddressInputs
from pants.engine.rules import Get, collect_rules, rule
Expand Down Expand Up @@ -39,11 +41,9 @@ async def infer_docker_dependencies(
dockerfile_info = await Get(DockerfileInfo, DockerfileInfoRequest(request.field_set.address))
targets = await Get(Targets, Addresses([request.field_set.address]))
build_args = await Get(DockerBuildArgs, DockerBuildArgsRequest(targets.expect_single()))
merged_from_build_args = {
k: build_args.to_dict().get(k, v)
for k, v in dockerfile_info.from_image_build_args.to_dict().items()
}
dockerfile_build_args = {k: v for k, v in merged_from_build_args.items() if v}
dockerfile_build_args = dockerfile_info.from_image_build_args.with_overrides(
build_args
).nonempty()

putative_image_addresses = set(
await Get(
Expand All @@ -61,7 +61,25 @@ async def infer_docker_dependencies(
),
)
)
maybe_output_paths = set(dockerfile_info.copy_source_paths)
putative_copy_target_addresses = set(
await Get(
Addresses,
UnparsedAddressInputs(
dockerfile_info.copy_build_args.nonempty().values(),
owning_address=dockerfile_info.address,
description_of_origin=softwrap(
f"""
the COPY arguments from the file {dockerfile_info.source}
from the target {dockerfile_info.address}
"""
),
skip_invalid_addresses=True,
),
)
)
maybe_output_paths = set(dockerfile_info.copy_source_paths) | set(
dockerfile_info.copy_build_args.nonempty().values()
)

# NB: There's no easy way of knowing the output path's default file ending as there could
# be none or it could be dynamic. Instead of forcing clients to tell us, we just use all the
Expand All @@ -71,10 +89,12 @@ async def infer_docker_dependencies(
possible_file_endings = {PurePath(path).suffix[1:] or None for path in maybe_output_paths}
inferred_addresses = []
for target in all_packageable_targets:
# If the target is an image we depend on, add it
if target.address in putative_image_addresses:
inferred_addresses.append(target.address)
continue

# If the target looks like it could generate the file we're trying to COPY
output_path_field = target.get(OutputPathField)
possible_output_paths = {
output_path_field.value_or_default(file_ending=file_ending)
Expand All @@ -85,6 +105,28 @@ async def infer_docker_dependencies(
inferred_addresses.append(target.address)
break

# If the target has the same address as an ARG that will eventually be copied
if target.address in putative_copy_target_addresses:
inferred_addresses.append(target.address)

# add addresses from source paths if they are files directly
addresses_from_source_paths = await Get(
Targets,
RawSpecs(
description_of_origin="halp",
unmatched_glob_behavior=GlobMatchErrorBehavior.ignore,
file_literals=tuple(
FileLiteralSpec(e)
for e in [
*dockerfile_info.copy_source_paths,
*dockerfile_info.copy_build_args.nonempty().values(),
]
),
),
)

inferred_addresses.extend(e.address for e in addresses_from_source_paths)

return InferredDependencies(Addresses(inferred_addresses))


Expand Down
Loading

0 comments on commit ed3f23e

Please sign in to comment.