Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend precompile to support a DAG #792

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fibonacci1729
Copy link

@fibonacci1729 fibonacci1729 commented Jan 9, 2025

Still WIP while I address some lingering TODOs but wanted to get this up for 👀 before too long.

This PR effectively breaks the assumption that layers input to precompile map 1:1 with layers returned. This is necessary to support things like component dependencies in the spin shim via composition where there is not a clear mapping of original to precompiled layers.

I tried to evolve the precompile API in the most straightforward way without requiring shim developers to have to manage a DAG themselves.

NOTE: have to update the other shims to conform to the new API which will be straightforward but i'd like to get feedback on this approach here first.

cc/ @Mossaka @jsturtevant @kate-goldenring

@jsturtevant
Copy link
Contributor

/cc @jprendes as well who is doing some larger changes to the engine

linking a few issues for context:
#504
spinframework/containerd-shim-spin#238

@@ -62,7 +65,7 @@ pub trait Engine: Clone + Send + Sync + 'static {
/// The cached, precompiled layers will be reloaded on subsequent runs.
/// The runtime is expected to return the same number of layers passed in, if the layer cannot be precompiled it should return `None` for that layer.
/// In some edge cases it is possible that the layers may already be precompiled and None should be returned in this case.
fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<Option<Vec<u8>>>> {
async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this now provides a way to do more than just pre-compiling such as composing I wonder if we would want to rename it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that definitely seems reasonable! Happy to call this process_layers or whatever makes sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should can_precompile() evolve as well? It seems a little strange to rename one and not the other. Additionally there are various labels applied with a *.precompile segment; should we rename those to process as well to maintain consistency?

@jsturtevant
Copy link
Contributor

a quick look and it is about what I was expecting for changes. It was helpful to see how it was used in a non trivial example with spinframework/containerd-shim-spin#259

Copy link
Collaborator

@jprendes jprendes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns around the GC references, and how we link layers with precompiled artifacts, but I'm no expect on any of that, so feel free to correct me.

I would eventually like to split the precompilation from the Engine, which would be a significant breaking change. If you think there's a more drastic change that could help with this work, I'd like to hear it :-)

crates/containerd-shim-wasm/src/container/engine.rs Outdated Show resolved Hide resolved

let gc_label =
format!("containerd.io/gc.ref.content.precompile.{child_digest}");
parent_layer.labels.insert(gc_label, child_digest.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this means that as long as at least one parent is present, the precompilation won't be GCd.
It seems to be that should be that as soon as one parent is GCd, the precompilation should be GCd as well.

Could this lead to the situation where we have a very popular layer that parents many components (e.g., a virtual FS layer), and prevents all the precompilations from being GCd?

@jsturtevant, you know more about this than me :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call out that I hadn't considered and wasn't an issue when we had 1to1 mapping.

We do have to put the reference on the content that we are compiling since the Image reference is mutable.

I don't know the best way to handle this. Maybe we could use a subscription to containerd image events to remove labels when an image is removed?

@cpuguy83 do you have any ideas on how we might be able to remove labels to avoid stale content?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what's happening here?

.labels
.into_iter()
.filter_map(|(key, child_digest)| {
if key.starts_with(&format!("{precompile_id}/child")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this label gets added to all parents of a precompiled layer.
If a popular layer is shared by many images (e.g., a layer for a virtual FS) and parents many precompilations, will this result in the precompiled artifact for all those images to be loaded here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC only the precompiled artifacts that are reachable via the child-label from the rooted image will be loaded here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure we have a test case for this scenario

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jprendes Great catch; it turns out that this is the case and is highly undesirable. Unfortunately, i think we'll need to rethink the labeling scheme here..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our community meeting was thinking on this a bit, instead of applying labels to try and track the state, could we create new content blob with the correct mappings? The content could then would tied to the Lifecyle of the image digest, and the pre-compiled components tied to the new blob. avoiding having labels propagate through out the rest of the content store.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant which mappings are you thinking about storing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially the one returned from the preprocess/precompile call, Instead of distributing that information via labels to the affected layers, we could create a new content with the mappings. When the digest is called, it could look those up and gather the correct pre-compiled layers.

@fibonacci1729 fibonacci1729 marked this pull request as ready for review January 15, 2025 18:27
@fibonacci1729
Copy link
Author

fibonacci1729 commented Jan 15, 2025

Opening up for review after addressing some of the initial feedback. Pending some tests and resolution to some convos about gc / labeling scheme employed here, i think this is ready to go!

Additionally updated the wasmtime shim to reflect the new precompile API.

@fibonacci1729 fibonacci1729 force-pushed the precompiled branch 4 times, most recently from 69599ab to a579efa Compare January 15, 2025 18:59
Copy link
Contributor

@kate-goldenring kate-goldenring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood this flow at one point and once again am a little lost in the weeds of the DAG. @fibonacci1729 Could you add some more comments to describe the functions' objectives and consider breaking the large functions out into smaller ones for better readability/understanding?

crates/containerd-shim-wasmtime/src/instance.rs Outdated Show resolved Hide resolved
@@ -406,32 +407,34 @@ impl Client {
.iter()
.filter(|x| is_wasm_layer(x.media_type(), T::supported_layers_types()));

let mut layers = vec![];
let mut all_layers = HashMap::new();
let media_type_label = precompile_label(T::name(), "media-type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this "media-type" string coming from or how was it determined?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each PrecompiledLayer returned from precompile specifies the media-type of the new layer's bytes so that later when reading each child wasm layer we can synthesize a Descriptor.

}
};

let parent_digest = layer.config.digest().to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is parents expected to never be empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, i think that should be the assumption. Otherwise an empty parents set would imply a new layer that has no relation to the other layers in the image. I'll add a check in the loop above to ensure this precondition for now.

Copy link
Contributor

@kate-goldenring kate-goldenring Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for runwasi to handle this? Add self as parent if none? If not, can you also specify this in the inline doc comments on the PrecompiledLayer structure?

/// A `PrecompiledLayer` represents the precompiled bytes of a layer and the digests of parent layers (if any) used to process it.
pub struct PrecompiledLayer {
    /// The media type this layer represents.
    pub media_type: String,
    /// The bytes of the precompiled layer.
    pub bytes: Vec<u8>,
    /// Digests of this layers' parents.
    /// TODO: elaborate on minimum parental requirement
    pub parents: BTreeSet<String>,
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good catch. Will update the comment. I'd be wary of the self if none approach as that would introduce a cycle we'd have to manage when loading.

@jprendes
Copy link
Collaborator

The test in client.rs L.1074 needs to be updated to match the new method signature.
The method behavior probably needs to be updated to match the new expected behavior.

@fibonacci1729
Copy link
Author

@jprendes I'm developing using lima on macos and when i run the tests via (which is analogous to what the make targets are doing):

cargo test --target-dir /tmp/lima/target --package containerd-shim-wasm --lib -- sandbox::containerd::client::tests --show-output

I see this error for each of the precompile tests:

called `Result::unwrap()` on an `Err` value:  failed to import image

Which happens when oci_helpers::import_image is called from generate_test_container.

FWIW, i'm also running containerd along side the tests.

Also, ran the tests as root but same result.

Any insight into what i might be goofing up?

@jprendes
Copy link
Collaborator

I am not super familiar with lima, but the tests using the OCI image do invoke ctr. Maybe you need to add the lima wrapper there?
Are you building and running inside lima, or using lima just for ctr / c8d?
You also need to load the OCI image in c8d beforehand, IIRC it was with make load/oci.

@jsturtevant
Copy link
Contributor

lima ships with nerdctl which might need some updates to work with the oci images. Does lima also install ctr, if not you can get it from containerd's releases and should work.

@fibonacci1729
Copy link
Author

fibonacci1729 commented Jan 17, 2025

Thanks! I abandoned lima and switched over to setting up an azure Linux vm and got the tests working. Working to update the tests now to reflect the api changes.

@fibonacci1729
Copy link
Author

fibonacci1729 commented Jan 21, 2025

Updated the original tests; would appreciate a discerning eye to ensure i'm not horribly breaking any assumptions 😅 .

Working to add some new tests.

/// In some edge cases it is possible that the layers may already be precompiled and None should be returned in this case.
fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<Option<Vec<u8>>>> {
bail!("precompile not supported");
fn precompile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why don't we define this fucntion to be async?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using async function in public traits is discouraged.
The reason is that usually you want your future to be Send so that you can execute it in a multithreaded runtime.
The async fn syntax doesn't enforce the Send bound, so the suggested approach is to desugar the async fn into an impl Future + Send.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to avoid bringing in the async_trait dependency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async_trait also boxes the future, which has a tiny overhead and prevent some compiler optimization. There's a make_variant crate that does this with a macro, but I don't think it's worth bringing a dependency just for this small case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, TIL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants