-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flat data representation proposal: Enables zero copy shared memory, zero allocation return types, binary serialization #398
Comments
Of course the lowering of flat POD types would be identical to normal POD types, I consider (resource) handles as POD here. So the modifier only applies (recursively) to string and list representations. Update: (Resource) handles don't serialize well across systems, so this needs more thoughts on when to forbid them. |
Having a "flat" binary representation of compound values could make a lot of sense and I've tried to imagine different ABI variations too (esp. in the context of streams, which help address the issue of not knowing how much buffer space to allocate since you can always just fill up one buffer, say "not done", and return for the next buffer). However, I've generally thought of this in terms of Canonical ABI options, since it's a low-level representation choice; is there a specific benefit to escalating this detail into the WIT-level type, where it applies to all languages and memory types (e.g., wasm-gc...)? Second, while I can see potential efficiency benefits to a flat binary representation, I don't see how this achieves "zero copy shared memory" -- it seems like the basic requirements to copy between separate components' separate linear memories remains? Lastly, I wasn't able to follow the "Buffer objects" section and how it relates to the |
I started with a WIT marker because I assumed that the same interface might mix flat and normal ABI calls, but I am no longer sure about this, especially since flat types offer some unique benefits - but are source code incompatible to normal Vec and String types (Rust, similar for C++). Zero copy comes into view if you construct the lowered elements in place in shared memory (you use a shared memory located buffer to construct everything) and use them on the receiver side without lifting. Of course for wasm you need either multi-memory (shared pages) or mmap support to enable two components to access the same physical memory. Host (mmap) support could enable spatial freedom from interference, that means only a single component can write to it, exclusive or multiple components can read from the same memory region. The host would handle the transition between these states (similar to what iceoryx does). This assumes that you reached a state where the copying of information between components is more costly than remapping virtual memory. This is typical for large AI tensors and camera images. The flat *) Or local buffers pre-allocated and then passed to functions to place the result into. |
Ah I see, that's an interesting point. I suppose we have the option to say that a
Many folks have suggested using multi-memory as a solution to avoiding copies over the years, but we keep finding that, in practice, "regular" C/C++/Rust code can only access the default memory so if you use a shared non-default memory to pass values, you'll end up with 2 copies (source → shared → destination). I keep asking someone to show me real code that would achieve zero-copy in practice using multi-memory (b/c hypothetically it's possible), but I haven't seen it yet.
One way to amortize the cost of establishing a shared mappings is creating a long-lived connection between two components which they can use to repeatedly passed chunks of memory. My intuition is that streams might be the right abstraction here (for repeatedly passing a large (flat) element). So perhaps the |
🤔 I feel that a proof of concept implementation might be a good idea to see how shared memory and flat types could work together to achieve zero copy. I will give it a try (most likely Rust and wasmtime based). |
I feel that 'multi-memory' is more convenient for communication between the host and Wasm. I mean, if we give a Wasm module an additional imported memory that is provided by the host, the host can store data in that specific area, and Wasm can access it directly without needing to copy it from the host's memory to Wasm's linear memory. |
@lum1n0us Do you know a good way to model access to a non-zero memory from a clang compiled language, e.g. C or Rust? Load and store intrinsics could be a solution, but that feels clumsy and cannot be passed via a pointer/reference argument to subroutines; segmented memory means that every load/store will pay significant penalty when coding memory indeces and offset separately. I think mmap as an extension of memory-control is the most reasonable strategy I can come up with. |
@cpetig Yes, I think that is the fundamental challenge we're working with here. And to summarize previous discussions: if the solution is to copy from the second-memory into the default-memory, I think we end up with something net worse, both in terms of performance (2 copies instead of one) and portability (since the entire contents of this non-default linear memory are now the host/guest interface, observable at all times at any address -- very likely to expose subtle impl differences that break programs in practice at scale over time). |
I just created a working proof of concept crate for the flat data parsing and creation at https://github.com/cpetig/flat-types-rust , the API already looks usable but will need a lot of extensions to provide a nice DX. I kept enum, struct and tuple APIs for now out of scope, likely a derivation macro will give this in a "somewhat" elegant way (set_X, get_X functions). I will continue my work on the shm wasm interface. |
I started a first prototype of shared memory zero copy at https://github.com/cpetig/wasm-shm-test/blob/main/wit/shm.wit#L12 but didn't complete it, yet. |
This all started with defining zero copy shared memory over a WIT interface (channel is WIT resource, inspired by iceoryx2):
and on the receiver side
with a WIT definition similar to
This is all fine unless you try to place a
list<string>
inside the shared memory. This put me on a journey which culminated in this discussion issue, … after I figured out a way to express this in WIT (this is inspired by flatbuffers and capn-proto).Flat marker
Adding a
flat<T[, P]>
marker, e.g.flat<list<string>, u16>
to arguments or results will change the data representation to flat binary encoding: All pointers inlist
andstring
become of the second type and are relative to the current position. The same type is used for length encoding. The default pointer typeP
could be s32.Passing an argument will follow the normal ownership rules, so imported functions only pass a view while exported functions pass ownership of the buffer. The flat type is represented by a classical (pointer, length) pair. See https://bytecodealliance.zulipchat.com/#narrow/stream/438936-SIG-Embedded/topic/Sept.2017th.202024.20Meeting/near/470965874 for data encoding examples.
Returning a flat data type would change to a caller provided buffer (uninitialized) as the last argument (also (pointer,length)). The call returns the used length (0 indicates error/buffer overflow). This makes the call defined with respect to (partial) ownership transfer.
Similarly to async with WASI 0.3 and
future<T>
this could become a general option to apply to all functions, making #385 unnecessary, because this is more flexible and more storage efficient.Buffer objects
Obtaining these buffers from the IPC component requires two new WIT return types:
buffer-mut<T>
andbuffer-view<T>
(read-only), both would encode as (pointer, length) and require a drop method to indicate that the buffer/view is no longer in use.Side benefits
This data representation can also be used as a disk or network encoding of data expressed in WIT (make sure to version your WIT desciption).
API considerations
True zero copy construction of these flat data types require to know in advance the size of a list and pass it to the constructor to linearly place objects in the buffer, relative pointers could be unsigned to simplify the encoding logic.
See the links in https://bytecodealliance.zulipchat.com/#narrow/stream/438936-SIG-Embedded/topic/Sept.2017th.202024.20Meeting/near/470497166 for API examples in Rust and C++.
PS: I initially represented read-only flat types by address only (as the length can be calculated from the data), but this feels counterproductive from a verification and storing perspective.
The text was updated successfully, but these errors were encountered: