Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform cache is not concurrency safe #21804

Open
lilatomic opened this issue Jan 2, 2025 · 1 comment · May be fixed by #21805
Open

Terraform cache is not concurrency safe #21804

lilatomic opened this issue Jan 2, 2025 · 1 comment · May be fixed by #21805
Labels
backend: Terraform Terraform backend-related issues bug

Comments

@lilatomic
Copy link
Contributor

Describe the bug
We introduced the use of the Terraform provider cache for caching provider downloads (the long part of Terraform initialisation) in #21221 . This is "not guaranteed to be concurrency safe", and is tracked in hashicorp/terraform#31964.

The specifics for us appear to be that concurrently-running init processes for modules or deployments where there is no lockfile:

  1. they may only see some of the files that they downloaded (nonatomic extraction of the provider's zipfile?)
  2. they calculate an incorrect H1 hash because they are missing some of the provider's files (the H1 hash is a hash of the extracted contents of the provider's zipfile)
  3. this incorrect hash is then baked into the lockfile
  4. the lockfile is put in the digest as the result of running terraform init
  5. the digest contains symlinks to the provider cache. the cache is fixed as a concurrently-running process finishes extracting files
  6. subsequent runs of terraform (such as terraform validate) using the cursed lockfile fail because the H1 hash is incorrect

Pants version
2.23


reproducer is a bit fiddly, but this is what I ended up with:

pants.toml: ``` [GLOBAL] pants_version = "2.23.0"

backend_packages.add = [
"pants.backend.experimental.terraform",
"pants.backend.python",
]

[subprocess-environment]

[python]
interpreter_constraints = ["==3.9.*"]
enable_resolves = true
[python.resolves]
python-default = "python-default.lock"

[download-terraform]
extra_env_vars=[
"PATH",
]


generator for resources:

#!/usr/bin/env python3
from pathlib import Path
from textwrap import dedent

def gen_dir(n: int):
d = Path(f"tf/tf{n}")
d.mkdir(exist_ok=True, parents=True)
with open (d / "main.tf", mode="w") as f:
f.write(dedent("""
terraform {
required_providers {
azuread = {
source = "hashicorp/azuread"
version = "> 2.15.0"
}
azurerm = {
source = "hashicorp/azurerm"
version = "
>3.0.0"
}
}
}
resource "null_resource" "a" {
count = 1
}
"""))
with open (d / "BUILD", mode="w") as f:
f.write(f"""terraform_module(name="{n}")""")

for i in range(0, 10):
gen_dir(i)


and then `pants check --only=terraform-validate ::`
</details>
@lilatomic lilatomic added the bug label Jan 2, 2025
@lilatomic
Copy link
Contributor Author

A current workaround is to generate lockfiles (for example with pants generate-lockfiles --resolve="//path/to/module:module"). This will cause the init process to fail. This will require rerunning it, but it prevents a cursed lockfile from making it into the Pants cache.

@lilatomic lilatomic added the backend: Terraform Terraform backend-related issues label Jan 2, 2025
@lilatomic lilatomic linked a pull request Jan 2, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Terraform Terraform backend-related issues bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant