-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add ability to download cached workspace (#520)
* create "stale" field on workspace state A provider that downloads its workspace state directly cannot assume that this state is a valid basis for a future incremental update, and should mark the downloaded workspace as stale. Signed-off-by: Will Murphy <[email protected]> * WIP add configs Signed-off-by: Will Murphy <[email protected]> * lint fix Signed-off-by: Will Murphy <[email protected]> * [wip] working on vunnel results db listing Signed-off-by: Alex Goodman <[email protected]> * update and tests for safe_extract_tar Now that we're using it for more than one thing, make an extractor that generally prevents path traversal. Signed-off-by: Will Murphy <[email protected]> * [wip] adding tests for fetching listing and archives Signed-off-by: Alex Goodman <[email protected]> * [wip] add more negative tests for provider tests Signed-off-by: Alex Goodman <[email protected]> * unit test for new workspace changes Signed-off-by: Will Murphy <[email protected]> * replace the workspace results instead of overlaying Signed-off-by: Will Murphy <[email protected]> * clean up hasher implementation Signed-off-by: Alex Goodman <[email protected]> * add tests for prep workspace from listing entry Signed-off-by: Will Murphy <[email protected]> * do not include inputs in tar test fixture Signed-off-by: Alex Goodman <[email protected]> * vunnel fetch existing workspace working Signed-off-by: Will Murphy <[email protected]> * add unit test for full update flow Signed-off-by: Will Murphy <[email protected]> * update existing unit tests for new config values Signed-off-by: Will Murphy <[email protected]> * add unit test for default behavior of new configs Signed-off-by: Will Murphy <[email protected]> * lint fix Signed-off-by: Will Murphy <[email protected]> * add missing annotations import Signed-off-by: Will Murphy <[email protected]> * Use 3.9 compatible annotations Relying on the from __future__ import annotations doesn't work with the mashumaro. Signed-off-by: Will Murphy <[email protected]> * validate that enabling import results requires host and path Signed-off-by: Will Murphy <[email protected]> * rename listing field and add schema Signed-off-by: Alex Goodman <[email protected]> * only require github token when downloading Signed-off-by: Alex Goodman <[email protected]> * add zstd support Signed-off-by: Alex Goodman <[email protected]> * add tests for zstd support Signed-off-by: Alex Goodman <[email protected]> * add tests for _has_newer_archive Signed-off-by: Will Murphy <[email protected]> * fix tests for zstd Signed-off-by: Alex Goodman <[email protected]> * show stderr to log when git commands fail Signed-off-by: Alex Goodman <[email protected]> * move import_results to common field on provider Signed-off-by: Will Murphy <[email protected]> * add concept for distribution version Signed-off-by: Alex Goodman <[email protected]> * single source of truth for provider schemas Signed-off-by: Alex Goodman <[email protected]> * add distribution-version to schema, provider state, and listing entry Signed-off-by: Alex Goodman <[email protected]> * clear workspace on different dist version Signed-off-by: Alex Goodman <[email protected]> * fix defaulting logic and update tests Signed-off-by: Will Murphy <[email protected]> * default distribution version and path Signed-off-by: Will Murphy <[email protected]> * make "" and None both use default path Signed-off-by: Will Murphy <[email protected]> --------- Signed-off-by: Will Murphy <[email protected]> Signed-off-by: Alex Goodman <[email protected]> Co-authored-by: Alex Goodman <[email protected]>
- Loading branch information
1 parent
6b4fa38
commit 90b176c
Showing
41 changed files
with
1,967 additions
and
127 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# `ProviderState` JSON Schema | ||
|
||
This schema governs the `listing.json` file used when providers are configured to fetch pre-computed results (by using `import_results_enabled`). The listing file is how the provider knows what results are available, where to fetch them from, and how to validate them. | ||
|
||
See `src/vunnel.distribution.Listing` for the root object that represents this schema. | ||
|
||
## Updating the schema | ||
|
||
Versioning the JSON schema must be done manually by copying the existing JSON schema into a new `schema-x.y.z.json` file and manually making the necessary updates (or by using an online tool such as https://www.liquid-technologies.com/online-json-to-schema-converter). | ||
|
||
This schema is being versioned based off of the "SchemaVer" guidelines, which slightly diverges from Semantic Versioning to tailor for the purposes of data models. | ||
|
||
Given a version number format `MODEL.REVISION.ADDITION`: | ||
|
||
- `MODEL`: increment when you make a breaking schema change which will prevent interaction with any historical data | ||
- `REVISION`: increment when you make a schema change which may prevent interaction with some historical data | ||
- `ADDITION`: increment when you make a schema change that is compatible with all historical data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-04/schema#", | ||
"type": "object", | ||
"properties": { | ||
"schema": { | ||
"type": "object", | ||
"properties": { | ||
"version": { | ||
"type": "string" | ||
}, | ||
"url": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": [ | ||
"version", | ||
"url" | ||
] | ||
}, | ||
"provider": { | ||
"type": "string" | ||
}, | ||
"available": { | ||
"type": "object", | ||
"properties": { | ||
"1": { | ||
"type": "array", | ||
"items": [ | ||
{ | ||
"type": "object", | ||
"properties": { | ||
"distribution_checksum": { | ||
"type": "string" | ||
}, | ||
"built": { | ||
"type": "string" | ||
}, | ||
"checksum": { | ||
"type": "string" | ||
}, | ||
"url": { | ||
"type": "string" | ||
}, | ||
"version": { | ||
"type": "integer" | ||
} | ||
}, | ||
"required": [ | ||
"built", | ||
"checksum", | ||
"distribution_checksum", | ||
"url", | ||
"version" | ||
] | ||
} | ||
] | ||
} | ||
} | ||
} | ||
}, | ||
"required": [ | ||
"schema", | ||
"available", | ||
"provider" | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-04/schema#", | ||
"type": "object", | ||
"title": "provider-workspace-state", | ||
"description": "describes the filesystem state of a provider workspace directory", | ||
"properties": { | ||
"provider": { | ||
"type": "string" | ||
}, | ||
"urls": { | ||
"type": "array", | ||
"items": [ | ||
{ | ||
"type": "string" | ||
} | ||
] | ||
}, | ||
"store": { | ||
"type": "string" | ||
}, | ||
"timestamp": { | ||
"type": "string" | ||
}, | ||
"listing": { | ||
"type": "object", | ||
"properties": { | ||
"digest": { | ||
"type": "string" | ||
}, | ||
"path": { | ||
"type": "string" | ||
}, | ||
"algorithm": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": [ | ||
"digest", | ||
"path", | ||
"algorithm" | ||
] | ||
}, | ||
"version": { | ||
"type": "integer", | ||
"description": "version describing the result data shape + the provider processing behavior semantics" | ||
}, | ||
"distribution_version": { | ||
"type": "integer", | ||
"description": "version describing purely the result data shape" | ||
}, | ||
"schema": { | ||
"type": "object", | ||
"properties": { | ||
"version": { | ||
"type": "string" | ||
}, | ||
"url": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": [ | ||
"version", | ||
"url" | ||
] | ||
}, | ||
"stale": { | ||
"type": "boolean", | ||
"description": "set to true if the workspace is stale and cannot be used for an incremental update" | ||
} | ||
}, | ||
"required": [ | ||
"provider", | ||
"urls", | ||
"store", | ||
"timestamp", | ||
"listing", | ||
"version", | ||
"schema" | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
from __future__ import annotations | ||
|
||
import datetime | ||
import os | ||
from dataclasses import dataclass, field | ||
from urllib.parse import urlparse | ||
|
||
import iso8601 | ||
from mashumaro.mixins.dict import DataClassDictMixin | ||
|
||
from vunnel import schema as schema_def | ||
|
||
DB_SUFFIXES = {".tar.gz", ".tar.zst"} | ||
|
||
|
||
@dataclass | ||
class ListingEntry(DataClassDictMixin): | ||
# the date this archive was built relative to the data enclosed in the archive | ||
built: str | ||
|
||
# the URL where the vunnel provider archive is located | ||
url: str | ||
|
||
# the digest of the archive referenced at the URL. | ||
# Note: all checksums are labeled with "algorithm:value" ( e.g. sha256:1234567890abcdef1234567890abcdef) | ||
distribution_checksum: str | ||
|
||
# the digest of the checksums file within the archive referenced at the URL | ||
# Note: all checksums are labeled with "algorithm:value" ( e.g. xxhash64:1234567890abcdef) | ||
enclosed_checksum: str | ||
|
||
# the provider distribution version this archive was built with (different than the provider version) | ||
distribution_version: int = 1 | ||
|
||
def basename(self) -> str: | ||
basename = os.path.basename(urlparse(self.url, allow_fragments=False).path) | ||
if not _has_suffix(basename, suffixes=DB_SUFFIXES): | ||
msg = f"entry url is not a db archive: {basename}" | ||
raise RuntimeError(msg) | ||
|
||
return basename | ||
|
||
def age_in_days(self, now: datetime.datetime | None = None) -> int: | ||
if not now: | ||
now = datetime.datetime.now(tz=datetime.timezone.utc) | ||
return (now - iso8601.parse_date(self.built)).days | ||
|
||
|
||
@dataclass | ||
class ListingDocument(DataClassDictMixin): | ||
# mapping of provider versions to a list of ListingEntry objects denoting archives available for download | ||
available: dict[int, list[ListingEntry]] | ||
|
||
# the provider name this document is associated with | ||
provider: str | ||
|
||
# the schema information for this document | ||
schema: schema_def.Schema = field(default_factory=schema_def.ProviderListingSchema) | ||
|
||
@classmethod | ||
def new(cls, provider: str) -> ListingDocument: | ||
return cls(available={}, provider=provider) | ||
|
||
def latest_entry(self, schema_version: int) -> ListingEntry | None: | ||
if schema_version not in self.available: | ||
return None | ||
|
||
if not self.available[schema_version]: | ||
return None | ||
|
||
return self.available[schema_version][0] | ||
|
||
def add(self, entry: ListingEntry) -> None: | ||
if not self.available.get(entry.distribution_version): | ||
self.available[entry.distribution_version] = [] | ||
|
||
self.available[entry.distribution_version].append(entry) | ||
|
||
# keep listing entries sorted by date (rfc3339 formatted entries, which iso8601 is a superset of) | ||
self.available[entry.distribution_version].sort( | ||
key=lambda x: iso8601.parse_date(x.built), | ||
reverse=True, | ||
) | ||
|
||
|
||
def _has_suffix(el: str, suffixes: set[str] | None) -> bool: | ||
if not suffixes: | ||
return True | ||
return any(el.endswith(s) for s in suffixes) |
Oops, something went wrong.