This document defines the Canonical ABI used to convert between the values and functions of components in the Component Model and the values and functions of modules in Core WebAssembly. See the AST explainer for a walkthrough of the static structure of a component and the async explainer for a high-level description of the async model being specified here.
- Supporting definitions
- Canonical definitions
canon lift
canon lower
canon resource.new
canon resource.drop
canon resource.rep
canon context.get
🔀canon context.set
🔀canon backpressure.set
🔀canon task.return
🔀canon yield
🔀canon waitable-set.new
🔀canon waitable-set.wait
🔀canon waitable-set.poll
🔀canon waitable-set.drop
🔀canon waitable.join
🔀canon subtask.drop
🔀canon {stream,future}.new
🔀canon {stream,future}.{read,write}
🔀canon {stream,future}.cancel-{read,write}
🔀canon {stream,future}.close-{readable,writable}
🔀canon error-context.new
🔀canon error-context.debug-message
🔀canon error-context.drop
🔀
The Canonical ABI specifies, for each component function signature, a
corresponding core function signature and the process for reading
component-level values into and out of linear memory. While a full formal
specification would specify the Canonical ABI in terms of macro-expansion into
Core WebAssembly instructions augmented with a new set of (spec-internal)
administrative instructions, the informal presentation here instead specifies
the process in terms of Python code that would be logically executed at
validation- and run-time by a component model implementation. The Python code
is presented by interleaving definitions with descriptions and eliding some
boilerplate. For a complete listing of all Python definitions in a single
executable file with a small unit test suite, see the
canonical-abi
directory.
The convention followed by the Python code below is that all traps are raised
by explicit trap()
/trap_if()
calls; Python assert()
statements should
never fire and are only included as hints to the reader. Similarly, there
should be no uncaught Python exceptions.
While the Python code for lifting and lowering values appears to create an
intermediate copy when lifting linear memory into high-level Python values, a
real implementation should be able to fuse lifting and lowering into a single
direct copy from the source linear memory into the destination linear memory.
Similarly, while the Python code specifies async using Python's
asyncio
library, a real implementation is expected to implement this
specified behavior using other lower-level concurrency primitives such as
fibers or Core WebAssembly stack-switching.
Lastly, independently of Python, the Canonical ABI defined below assumes that
out-of-memory conditions (such as memory.grow
returning -1
from within
realloc
) will trap (via unreachable
). This significantly simplifies the
Canonical ABI by avoiding the need to support the complicated protocols
necessary to support recovery in the middle of nested allocations. (Note: by
nature of eliminating realloc
, switching to
lazy lowering
would obviate this issue, allowing guest wasm code to handle failure by eagerly
returning some value of the declared return type to indicate failure.
Most Canonical ABI definitions depend on some ambient information which is
established by the canon lift
- or canon lower
-defined function that is
being called:
- the ABI options supplied via
canonopt
- the containing component instance
- the
Task
orSubtask
used to lower or lift, resp.,borrow
handles
These three pieces of ambient information are stored in an LiftLowerContext
object that is threaded through all the Python functions below as the cx
parameter/field.
class LiftLowerContext:
opts: CanonicalOptions
inst: ComponentInstance
borrow_scope: Optional[Task|Subtask]
def __init__(self, opts, inst, borrow_scope = None):
self.opts = opts
self.inst = inst
self.borrow_scope = borrow_scope
The borrow_scope
field may be None
if the types being lifted/lowered are
known to not contain borrow
. The CanonicalOptions
, ComponentInstance
,
Task
and Subtask
classes are defined next.
The CanonicalOptions
class contains all the possible canonopt
immediates that can be passed to the canon
definition being implemented.
@dataclass
class CanonicalOptions:
memory: Optional[bytearray] = None
string_encoding: Optional[str] = None
realloc: Optional[Callable] = None
post_return: Optional[Callable] = None
sync: bool = True # = !canonopt.async
callback: Optional[Callable] = None
always_task_return: bool = False
(Note that the async
canonopt
is inverted to sync
here for the practical
reason that async
is a keyword and most branches below want to start with the
sync = True
case.)
The following Python classes define spec-internal state and utility methods that are used to define the externally-visible behavior of Canonical ABI's lifting, lowering and built-in definitions below. These fields are chosen for simplicity over performance and thus an optimizing implementation is expected to use a more optimized representations as long as it preserves the same externally-visible behavior. Some specific examples of expected optimizations are noted below.
The ComponentInstance
class contains all the relevant per-component-instance
state that the definitions below use to implement the Component Model's runtime
behavior and enforce invariants.
class ComponentInstance:
resources: Table[ResourceHandle]
waitables: Table[Waitable]
waitable_sets: Table[WaitableSet]
error_contexts: Table[ErrorContext]
may_leave: bool
backpressure: bool
calling_sync_export: bool
calling_sync_import: bool
pending_tasks: list[tuple[Task, asyncio.Future]]
starting_pending_task: bool
def __init__(self):
self.resources = Table[ResourceHandle]()
self.waitables = Table[Waitable]()
self.waitable_sets = Table[WaitableSet]()
self.error_contexts = Table[ErrorContext]()
self.may_leave = True
self.backpressure = False
self.calling_sync_export = False
self.calling_sync_import = False
self.pending_tasks = []
self.starting_pending_task = False
These fields will be described as they are used by the following definitions.
The generic Table
class, used by the resources
, waitables
and
error_contexts
fields of ComponentInstance
above, encapsulates a single
mutable, growable array of elements that are represented in Core WebAssembly as
i32
indices into the array.
ElemT = TypeVar('ElemT')
class Table(Generic[ElemT]):
array: list[Optional[ElemT]]
free: list[int]
MAX_LENGTH = 2**28 - 1
def __init__(self):
self.array = [None]
self.free = []
def get(self, i):
trap_if(i >= len(self.array))
trap_if(self.array[i] is None)
return self.array[i]
def add(self, e):
if self.free:
i = self.free.pop()
assert(self.array[i] is None)
self.array[i] = e
else:
i = len(self.array)
trap_if(i > Table.MAX_LENGTH)
self.array.append(e)
return i
def remove(self, i):
e = self.get(i)
self.array[i] = None
self.free.append(i)
return e
Table
maintains a dense array of elements that can contain holes created by
the remove
method (defined below). When table elements are accessed (e.g., by
canon_lift
and resource.rep
, below), there are thus both a bounds check and
hole check necessary. Upon initialization, table element 0
is allocated and
set to None
, effectively reserving index 0
which is both useful for
catching null/uninitialized accesses and allowing 0
to serve as a sentinel
value.
The add
and remove
methods work together to maintain a free list of holes
that are used in preference to growing the table. The free list is represented
as a Python list here, but an optimizing implementation could instead store the
free list in the free elements of array
.
The limit of 2**30
ensures that the high 2 bits of table indices are unset
and available for other use in guest code (e.g., for tagging, packed words or
sentinel values).
The ResourceHandle
class defines the elements of the resources
field of
ComponentInstance
:
class ResourceHandle:
rt: ResourceType
rep: int
own: bool
borrow_scope: Optional[Task]
num_lends: int
def __init__(self, rt, rep, own, borrow_scope = None):
self.rt = rt
self.rep = rep
self.own = own
self.borrow_scope = borrow_scope
self.num_lends = 0
The rt
and rep
fields of ResourceHandle
store the rt
and rep
parameters passed to the resource.new
call that created this handle. The
rep
field is currently fixed to be an i32
, but will be generalized in the
future to other types.
The own
field indicates whether this element was created from an own
type
(or, if false, a borrow
type).
The borrow_scope
field stores the Task
that lowered the borrowed handle as a
parameter. When a component only uses sync-lifted exports, due to lack of
reentrance, there is at most one Task
alive in a component instance at any
time and thus an optimizing implementation doesn't need to store the Task
per ResourceHandle
.
The num_lends
field maintains a conservative approximation of the number of
live handles that were lent from this own
handle (by calls to borrow
-taking
functions). This count is maintained by the ImportCall
bookkeeping functions
(above) and is ensured to be zero when an own
handle is dropped.
The ResourceType
class represents a runtime instance of a resource type that
has been created either by the host or a component instance (where multiple
component instances of the same static component create multiple ResourceType
instances). ResourceType
Python object identity is used by trapping guards on
the rt
field of ResourceHandle
(above) and thus resource type equality is
not defined structurally (on the contents of ResourceType
).
class ResourceType(Type):
impl: ComponentInstance
dtor: Optional[Callable]
dtor_sync: bool
dtor_callback: Optional[Callable]
def __init__(self, impl, dtor = None, dtor_sync = True, dtor_callback = None):
self.impl = impl
self.dtor = dtor
self.dtor_sync = dtor_sync
self.dtor_callback = dtor_callback
A "buffer" is an abstract region of memory that can either be read-from or written-to. This region of memory can either be owned by the host or by wasm. Currently wasm memory is always 32-bit linear memory, but soon 64-bit and GC memory will be added. Thus, buffers provide an abstraction over at least 4 different "kinds" of memory.
(Currently, buffers are only created implicitly as part of stream and future
built-ins such as stream.read
. However, in the
future,
explicit component-level buffer types and canonical built-ins may be added to
allow explicitly creating buffers and passing them between components.)
All buffers have an associated component-level value type t
and a remain
method that returns how many t
values may still be read or written. Thus
buffers hide their original/complete size. A "readable buffer" allows reading
t
values from the buffer's memory. A "writable buffer" allows writing t
values into the buffer's memory. Buffers are represented by the following 3
abstract Python classes:
class Buffer:
MAX_LENGTH = 2**30 - 1
t: ValType
remain: Callable[[], int]
class ReadableBuffer(Buffer):
read: Callable[[int], list[any]]
class WritableBuffer(Buffer):
write: Callable[[list[any]]]
As preconditions (internally ensured by the Canonical ABI code below):
read
may only be passed a positive number less than or equal toremain
write
may only be passed a non-empty list of length less than or equal toremain
containing values of typet
Since read
and write
are synchronous Python functions, buffers inherently
guarantee synchronous access to a fixed-size backing memory and are thus
distinguished from streams (which provide asynchronous operations for reading
and writing an unbounded number of values to potentially-different regions of
memory over time).
The ReadableBuffer
and WritableBuffer
abstract classes may either be
implemented by the host or by another wasm component. In the latter case, these
abstract classes are implemented by the concrete ReadableBufferGuestImpl
and
WritableBufferGuestImpl
classes which eagerly check alignment and range
when the buffer is constructed so that read
and write
are infallible
operations (modulo traps):
class BufferGuestImpl(Buffer):
cx: LiftLowerContext
t: ValType
ptr: int
progress: int
length: int
def __init__(self, t, cx, ptr, length):
trap_if(length == 0 or length > Buffer.MAX_LENGTH)
if t:
trap_if(ptr != align_to(ptr, alignment(t)))
trap_if(ptr + length * elem_size(t) > len(cx.opts.memory))
self.cx = cx
self.t = t
self.ptr = ptr
self.progress = 0
self.length = length
def remain(self):
return self.length - self.progress
class ReadableBufferGuestImpl(BufferGuestImpl):
def read(self, n):
assert(n <= self.remain())
if self.t:
vs = load_list_from_valid_range(self.cx, self.ptr, n, self.t)
self.ptr += n * elem_size(self.t)
else:
vs = n * [()]
self.progress += n
return vs
class WritableBufferGuestImpl(BufferGuestImpl, WritableBuffer):
def write(self, vs):
assert(len(vs) <= self.remain())
if self.t:
store_list_into_valid_range(self.cx, vs, self.ptr, self.t)
self.ptr += len(vs) * elem_size(self.t)
else:
assert(all(v == () for v in vs))
self.progress += len(vs)
When t
is None
(arising from stream
and future
with empty element
types), the core-wasm-supplied ptr
is entirely ignored, while the length
and progress
are still semantically meaningful. Source bindings may represent
this case with a generic stream/future of [unit] type or a distinct type that
conveys events without values.
The load_list_from_valid_range
and store_list_into_valid_range
functions
that do all the heavy lifting are shared with function parameter/result lifting
and lowering and defined below.
The ContextLocalStorage
class implements context-local storage, with each
new Task
getting a fresh, zero-initialized ContextLocalStorage
that can be
accessed by core wasm code using canon context.{get,set}
. (In the future,
when threads are integrated, each thread.spawn
ed thread would also get a
fresh, zero-initialized ContextLocalStorage
.)
class ContextLocalStorage:
LENGTH = 2
array: list[int]
def __init__(self):
self.array = [0] * ContextLocalStorage.LENGTH
def set(self, i, v):
assert(types_match_values(['i32'], [v]))
self.array[i] = v
def get(self, i):
return self.array[i]
A Task
object is created for each call to canon_lift
and is implicitly
threaded through all core function calls. This implicit Task
parameter
represents the "current task". This section will introduce the Task
class
in pieces, starting with the fields and initialization:
class Task:
opts: CanonicalOptions
inst: ComponentInstance
ft: FuncType
caller: Optional[Task]
on_return: Optional[Callable]
on_block: Callable[[Awaitable], Awaitable]
num_subtasks: int
num_borrows: int
context: ContextLocalStorage
def __init__(self, opts, inst, ft, caller, on_return, on_block):
self.opts = opts
self.inst = inst
self.ft = ft
self.caller = caller
self.on_return = on_return
self.on_block = on_block
self.num_subtasks = 0
self.num_borrows = 0
self.context = ContextLocalStorage()
Using a conservative syntactic analysis of a complete component, an optimizing
implementation can statically eliminate fields when a particular feature (such
as borrow
or async
) is not present in the component definition.
There is a global singleton asyncio.Lock
named Task.current
upon which all
non-current tasks are parked await
ing. When the Canonical ABI specifies
that the current task may cooperatively yield to another task, it does so by
having the current task release and reacquire Task.current
, allowing a
non-current task to acquire Task.current
to become the current task. Without
this extra layer of synchronization, normal Python async
/await
semantics
would imply that execution could switch at any await
point, which would
behave more like preemptive multi-threading instead of cooperative asynchronous
functions.
By default, current-task-switching happens when a task blocks on an
asyncio.Awaitable
to resolve via the sync_on_block
function:
current = asyncio.Lock()
async def sync_on_block(a: Awaitable):
Task.current.release()
v = await a
await Task.current.acquire()
return v
The on_block
field of Task
either stores sync_on_block
or
async_on_block
(defined below) and is implicitly supplied as an argument by
the task's caller to allow the caller to configure what happens when the callee
blocks.
The Task.enter
method is called immediately after constructing a Task
to
perform all the guards and lowering necessary before the task's core wasm entry
point can start running:
async def enter(self, on_start):
assert(Task.current.locked())
self.trap_if_on_the_stack(self.inst)
if not self.may_enter(self) or self.inst.pending_tasks:
f = asyncio.Future()
self.inst.pending_tasks.append((self, f))
await self.on_block(f)
assert(self.may_enter(self) and self.inst.starting_pending_task)
self.inst.starting_pending_task = False
if self.opts.sync:
self.inst.calling_sync_export = True
cx = LiftLowerContext(self.opts, self.inst, self)
return lower_flat_values(cx, MAX_FLAT_PARAMS, on_start(), self.ft.param_types())
The trap_if_on_the_stack
method (defined next) prevents unexpected
reentrance, enforcing a component invariant.
The may_enter
guard enforces backpressure.
Backpressure allows an overloaded component instance to safely avoid being
forced to continually allocate more memory (for lowered arguments and per-task
state) until OOM. Backpressure is also applied automatically when attempting
to enter a component instance that is in the middle of a synchronous operation
and thus non-reentrant. If may_enter
(defined below) signals that
backpressure is needed, enter
blocks by putting itself into the
per-ComponentInstance
pending_tasks
queue and blocking until released by
maybe_start_pending_task
(defined below). The or self.inst.pending_tasks
disjunct ensures fairness, preventing continual new tasks from starving pending
tasks.
The calling_sync_export
flag set by enter
(and cleared by exit
) is used
by may_enter
to prevent sync-lifted export calls from overlapping.
Once all the guards and bookkeeping has been done, the enter
method lowers
the given arguments into the callee's memory (possibly executing realloc
)
returning the final set of flat arguments to pass into the core wasm callee.
The Task.trap_if_on_the_stack
method called by enter
prevents reentrance
using the caller
field of Task
which points to the task's supertask in the
async call tree defined by structured concurrency. Structured concurrency
is necessary to distinguish between the deadlock-hazardous kind of reentrance
(where the new task is a transitive subtask of a task already running in the
same component instance) and the normal kind of async reentrance (where the new
task is just a sibling of any existing tasks running in the component
instance). Note that, in the future, there will be a way for a
function to opt in (via function type attribute) to the hazardous kind of
reentrance, which will nuance this test.
def trap_if_on_the_stack(self, inst):
c = self.caller
while c is not None:
trap_if(c.inst is inst)
c = c.caller
An optimizing implementation can avoid the O(n) loop in trap_if_on_the_stack
in several ways:
- Reentrance by a child component can (often) be statically ruled out when the parent component doesn't both lift and lower the child's imports and exports (i.e., "donut wrapping").
- Reentrance of the root component by the host can either be asserted not to happen or be tracked in a per-root-component-instance flag.
- When a potentially-reenterable child component only lifts and lowers synchronously, reentrance can be tracked in a per-component-instance flag.
- For the remaining cases, the live instances on the stack can be maintained in a packed bit-vector (assigning each potentially-reenterable async component instance a static bit position) that is passed by copy from caller to callee.
The Task.may_enter
method called by enter
blocks a new task from starting
for three reasons:
- The core wasm code has explicitly indicated that it is overloaded by calling
task.backpressure
to set thebackpressure
field ofTask
. - The component instance is currently blocked on a synchronous call from core wasm into a Canonical ABI built-in and is thus not currently in a reentrant state.
- The current pending call is to a synchronously-lifted export and there is already a synchronously-lifted export in progress.
def may_enter(self, pending_task):
return not self.inst.backpressure and \
not self.inst.calling_sync_import and \
not (self.inst.calling_sync_export and pending_task.opts.sync)
Notably, the above definition of may_enter
only prevents synchronously
lifted tasks from overlapping. Asynchronously lifted tasks are allowed to
freely overlap (with each other and synchronously-lifted tasks). This allows
purely-synchronous toolchains to stay simple and ignore asynchrony while
enabling more-advanced hybrid use cases (such as "background tasks"), putting
the burden of not interfering with the synchronous tasks on the toolchain.
The Task.maybe_start_pending_task
method unblocks pending tasks enqueued by
enter
above once may_enter
is true for the pending task. One key property
ensured by the trio of enter
, may_enter
and maybe_start_pending_task
is
that pending_tasks
are only allowed to start one at a time, ensuring that if
an overloaded component instance enables backpressures, builds up a large queue
of pending tasks, and then disables backpressure, there will not be a
thundering herd of tasks started all at once that OOM the component before it
has a chance to re-enable backpressure. To ensure this property, the
starting_pending_task
flag is set here and cleared when the pending task
actually resumes execution in enter
, preventing more pending tasks from being
started in the interim. Lastly, maybe_start_pending_task
is only called at
specific points (wait_on
and exit
, below) where the core wasm code has had
the opportunity to re-enable backpressure if need be.
def maybe_start_pending_task(self):
if self.inst.starting_pending_task:
return
for i,(pending_task,pending_future) in enumerate(self.inst.pending_tasks):
if self.may_enter(pending_task):
self.inst.pending_tasks.pop(i)
self.inst.starting_pending_task = True
pending_future.set_result(None)
return
Notably, the loop in maybe_start_pending_task
allows pending async tasks to
start even when there is a blocked pending sync task ahead of them in the
pending_tasks
queue.
The Task.yield_
method is called by canon yield
or, when a callback
is used, when core wasm returns the "yield" code to the event loop. Yielding
allows the runtime to switch execution to another task without having to wait
for any external I/O as emulated in the Python definition by waiting on
asyncio.sleep(0)
using the wait_on
method (defined next):
async def yield_(self, sync):
await self.wait_on(asyncio.sleep(0), sync)
The Task.wait_on
method defines how to block the current task until a given
Python awaitable is resolved. By calling the on_block
callback, wait_on
allows other tasks to execute before the current task resumes, but which
other tasks depends on the sync
boolean parameter:
When blocking synchronously, only tasks in other component instances may
execute; the current component instance must not observe any interleaved
execution. This is achieved by setting calling_sync_import
during
on_block
, which is checked by both may_enter
(to prevent reentrance) as
well as wait_on
itself in the asynchronous case (to prevent resumption of
already-started async tasks).
When blocking asynchronously, other tasks in the same component instance
may execute. In this case, it's possible that one of those other tasks
performed a synchronous wait_on
(which set calling_sync_import
) in which
case this task must wait until it is reenabled. This reenablement is
signalled asynchronously via the async_waiting_tasks
asyncio.Condition
.
async_waiting_tasks = asyncio.Condition(current)
async def wait_on(self, awaitable, sync):
assert(not self.inst.calling_sync_import)
if sync:
self.inst.calling_sync_import = True
v = await self.on_block(awaitable)
self.inst.calling_sync_import = False
self.async_waiting_tasks.notify_all()
else:
self.maybe_start_pending_task()
v = await self.on_block(awaitable)
while self.inst.calling_sync_import:
Task.current.release()
await self.async_waiting_tasks.wait()
return v
The Task.call_sync
method is used by canon_lower
to make a synchronous call
to callee
given *args
and works just like the sync
case of wait_on
above except that call_sync
avoids unconditionally blocking (by calling
on_block
). Instead, the caller simply passes its own on_call
callback to
the callee, so that the caller blocks iff the callee blocks. This means that
N-deep synchronous callstacks avoid the overhead of async calls if none of the
calls in the stack actually block on external I/O.
async def call_sync(self, callee, *args):
assert(not self.inst.calling_sync_import)
self.inst.calling_sync_import = True
v = await callee(*args, self.on_block)
self.inst.calling_sync_import = False
self.async_waiting_tasks.notify_all()
return v
The Task.call_async
method is used by canon_lower
to make an asynchronous
call to callee
. To implement an "asynchronous" call using Python's asyncio
machinery, call_async
spawns the call to callee
in a separate
asyncio.Task
and waits until either this asyncio.Task
blocks (by calling
the given async_on_block
callback) or returns without blocking:
async def call_async(self, callee, *args):
ret = asyncio.Future()
async def async_on_block(a: Awaitable):
if not ret.done():
ret.set_result(None)
else:
Task.current.release()
v = await a
await Task.current.acquire()
return v
async def do_call():
await callee(*args, async_on_block)
if not ret.done():
ret.set_result(None)
else:
Task.current.release()
asyncio.create_task(do_call())
await ret
An important behavioral property specified by this code is that whether or not
the callee blocks, control flow is unconditionally transferred back to the
async
caller without releasing the Task.current
lock, meaning that no other
tasks get to run (like call_sync
and unlike wait_on
). In particular, the
acquire()
and release()
of Task.current
in this code only happens in the
callee's asyncio.Task
once the callee has blocked the first time and
transferred control flow back to the async
caller. The net effect is that an
async call starts like a normal sync call and only forks off an async execution
context) if the callee blocks. This is useful for enabling various
optimizations biased towards the non-blocking case.
Another important behavioral property specified by this code is that, when an
async
call is made, even if the callee makes nested synchronous
cross-component calls, if any transitive callee blocks, control flow is
promptly returned back to the async
caller, setting aside the whole
synchronous segment of the call stack that blocked. This is enabled by (and the
entire point of) on_block
being a dynamical closure value determined by the
caller.
As a side note: this whole idiosyncratic use of asyncio
(with Task.current
and on_block
) is emulating algebraic effects as realized by the Core
WebAssembly stack-switching proposal. If Python had the equivalent of
stack-switching's suspend
and resume
instructions, Task.current
and
on_block
would disappear; all callsites of on_block
would suspend
with a
block
effect and call_async
would resume
with a block
handler. Indeed,
a runtime that has implemented stack-switching could implement async
by compiling down to Core WebAssembly using cont.new
/suspend
/resume
.
The Task.return_
method is called by either canon_task_return
or
canon_lift
(both defined below) to lift result values and pass them to the
caller using the caller's on_return
callback. Using a callback instead of
simply returning the values from canon_lift
enables the callee to keep
executing after returning its results. There is a dynamic error if
task.return
is not called exactly once and if the callee has not dropped all
borrowed handles by the time task.return
is called. This means that the
caller can assume that ownership all its lent borrowed handles has been
returned to it when it is notified that the Subtask
has reached the
RETURNED
state.
def return_(self, flat_results):
trap_if(not self.on_return)
trap_if(self.num_borrows > 0)
if self.opts.sync and not self.opts.always_task_return:
maxflat = MAX_FLAT_RESULTS
else:
maxflat = MAX_FLAT_PARAMS
ts = self.ft.result_types()
cx = LiftLowerContext(self.opts, self.inst, self)
vs = lift_flat_values(cx, maxflat, CoreValueIter(flat_results), ts)
self.on_return(vs)
self.on_return = None
The maximum flattened core wasm values depends on whether this is a normal
synchronous call (in which return values are returned by core wasm) or a newer
async or sychronous-using-always-task-return
call, in which case return
values are passed as parameters to canon task.return
.
Lastly, the Task.exit
method is called when the task has signalled that it
intends to exit. This method guards that the various obligations of the callee
implied by the Canonical ABI have in fact been met and also performs final
bookkeeping that matches initial bookkeeping performed by enter
. Lastly, when
a Task
exits, it attempts to start another pending task which, in particular,
may be a synchronous task unblocked by the clearing of calling_sync_export
.
def exit(self):
assert(Task.current.locked())
trap_if(self.num_subtasks > 0)
trap_if(self.on_return)
assert(self.num_borrows == 0)
if self.opts.sync:
assert(self.inst.calling_sync_export)
self.inst.calling_sync_export = False
self.maybe_start_pending_task()
A "waitable" is anything that can be stored in the component instance's
waitables
table. Currently, there are 5 different kinds of waitables:
subtasks and the 4 combinations of the
readable and writable ends of futures and streams.
Waitables deliver "events" which are values of the following EventTuple
type.
The two int
"payload" fields of EventTuple
store core wasm i32
s and are
to be interpreted based on the EventCode
. The meaning of the different
EventCode
s and their payloads will be introduced incrementally below by the
code that produces the events (specifically, in subtask_event
and
copy_event
).
class CallState(IntEnum):
STARTING = 1
STARTED = 2
RETURNED = 3
class EventCode(IntEnum):
NONE = 0
CALL_STARTING = CallState.STARTING
CALL_STARTED = CallState.STARTED
CALL_RETURNED = CallState.RETURNED
STREAM_READ = 5
STREAM_WRITE = 6
FUTURE_READ = 7
FUTURE_WRITE = 8
EventTuple = tuple[EventCode, int, int]
The Waitable
class factors out the state and behavior common to all 5 kinds
of waitables, which are each defined as subclasses of Waitable
below. Every
Waitable
can store at most one pending event in its pending_event
field
which will be delivered to core wasm as soon as the core wasm code explicitly
waits on this Waitable
(which may take an arbitrarily long time). A
pending_event
is represented in the Python code below as a closure so that
the closure can specify behaviors that trigger right before events are
delivered to core wasm and so that the closure can compute the event based on
the state of the world at delivery time (as opposed to when pending_event
was
first set). Currently, pending_event
holds a closure of either the
subtask_event
or copy_event
functions defined below. An optimizing
implementation would avoid closure allocation by inlining a union containing
the closure fields directly in the Waitable
element of the waitables
table.
A waitable can belong to at most one "waitable set" (defined next) which is
referred to by the maybe_waitable_set
field. A Waitable
's pending_event
is delivered when core wasm code waits on a task's waitable set (via
task.wait
or, when using callback
, by returning to the event loop). The
maybe_has_pending_event
field of WaitableSet
stores an asyncio.Event
boolean which is used to wait for any Waitable
element to make progress. As
the name implies, maybe_has_pending_event
is allowed to have false positives
(but not false negatives) and thus it's not clear()
ed even when a
Waitable
's pending_event
is cleared (because other Waitable
s in the set
may still have a pending event).
In addition to being waited on as a member of a task, a Waitable
can also be
waited on individually (e.g., as part of a synchronous stream.cancel-read
).
This is enabled by the has_pending_event_
field which stores an
asyncio.Event
boolean that is_set()
iff pending_event
is set.
Based on all this, the Waitable
class is defined as follows:
class Waitable:
pending_event: Optional[Callable[[], EventTuple]]
has_pending_event_: asyncio.Event
maybe_waitable_set: Optional[WaitableSet]
def __init__(self):
self.pending_event = None
self.has_pending_event_ = asyncio.Event()
self.maybe_waitable_set = None
def set_event(self, pending_event):
self.pending_event = pending_event
self.has_pending_event_.set()
if self.maybe_waitable_set:
self.maybe_waitable_set.maybe_has_pending_event.set()
def has_pending_event(self):
assert(self.has_pending_event_.is_set() == bool(self.pending_event))
return self.has_pending_event_.is_set()
async def wait_for_pending_event(self):
return await self.has_pending_event_.wait()
def get_event(self) -> EventTuple:
assert(self.has_pending_event())
pending_event = self.pending_event
self.pending_event = None
self.has_pending_event_.clear()
return pending_event()
def join(self, maybe_waitable_set):
if self.maybe_waitable_set:
self.maybe_waitable_set.elems.remove(self)
self.maybe_waitable_set = maybe_waitable_set
if maybe_waitable_set:
maybe_waitable_set.elems.append(self)
if self.has_pending_event():
maybe_waitable_set.maybe_has_pending_event.set()
def drop(self):
assert(not self.has_pending_event())
self.join(None)
A "waitable set" contains a collection of waitables that can be waited on or
polled for any element to make progress. Although the WaitableSet
class
below represents elems
as a list
and implements poll
with an O(n) search,
because a waitable can be associated with at most one set and can contain at
most one pending event, a real implemenation could instead store a list of
waitables-with-pending-events as a linked list embedded directly in the
Waitable
table element to avoid the separate allocation while providing O(1)
polling.
class WaitableSet:
elems: list[Waitable]
maybe_has_pending_event: asyncio.Event
num_waiting: int
def __init__(self):
self.elems = []
self.maybe_has_pending_event = asyncio.Event()
self.num_waiting = 0
async def wait(self) -> EventTuple:
self.num_waiting += 1
while True:
await self.maybe_has_pending_event.wait()
if (e := self.poll()):
self.num_waiting -= 1
return e
def poll(self) -> Optional[EventTuple]:
random.shuffle(self.elems)
for w in self.elems:
assert(self is w.maybe_waitable_set)
if w.has_pending_event():
assert(self.maybe_has_pending_event.is_set())
return w.get_event()
self.maybe_has_pending_event.clear()
return None
def drop(self):
trap_if(len(self.elems) > 0)
trap_if(self.num_waiting > 0)
The WaitableSet.drop
method traps if dropped while it still contains elements
(whose Waitable.waitable_set
field would become dangling) or if it is being
waited-upon by another Task
.
Note: the random.shuffle
in poll
is meant to give runtimes the semantic
freedom to schedule delivery of events non-deterministically (e.g., taking into
account priorities); runtimes do not have to literally randomize event
delivery.
While canon_lift
creates Task
objects when called, canon_lower
creates
Subtask
objects when called. If the callee (being canon_lower
ed) is another
component's (canon_lift
ed) function, there will thus be a Subtask
+Task
pair created. However, if the callee is a host-defined function, the Subtask
will stand alone. Thus, in general, the call stack at any point in time when
wasm calls a host-defined import will have the form:
[Host caller] -> [Task] -> [Subtask+Task]* -> [Subtask] -> [Host callee]
The Subtask
class is simpler than Task
and only manages a few fields of
state that are relevant to the caller. As with Task
, this section will
introduce Subtask
incrementally, starting with its fields and initialization:
class Subtask(Waitable):
state: CallState
lenders: list[ResourceHandle]
finished: bool
supertask: Optional[Task]
def __init__(self):
self.state = CallState.STARTING
self.lenders = []
self.finished = False
self.supertask = None
The state
field of Subtask
holds a CallState
enum value (defined above as
part of the definition of EventCode
) that describes the callee's current
state along the linear progression from STARTING
to
STARTED
to RETURNED
.
Although Subtask
derives Waitable
, __init__
does not initialize the
Waitable
base object. Instead, the Waitable
base is only initialized when
the Subtask
is added to the component instance's waitables
table which in
turn only happens if the call is async
and blocks. In this case, the
Subtask.add_to_waitables
method is called:
def add_to_waitables(self, task):
assert(not self.supertask)
self.supertask = task
self.supertask.num_subtasks += 1
Waitable.__init__(self)
return task.inst.waitables.add(self)
The num_subtasks
increment ensures that the parent Task
cannot exit
without having waited for all its subtasks to return (or, in the
future be cancelled), thereby preserving structured
concurrency.
The Subtask.add_lender
method is called by lift_borrow
(below). This method
increments the num_lends
counter on the handle being lifted, which is guarded
to be zero by canon_resource_drop
(below). The Subtask.finish
method is
called when the subtask returns its value to the caller, at which point all the
borrowed handles are logically returned to the caller by decrementing all the
num_lend
counts that were initially incremented.
def add_lender(self, lending_handle):
assert(not self.finished and self.state != CallState.RETURNED)
lending_handle.num_lends += 1
self.lenders.append(lending_handle)
def finish(self):
assert(not self.finished and self.state == CallState.RETURNED)
for h in self.lenders:
h.num_lends -= 1
self.finished = True
Note, the lenders
list usually has a fixed size (in all cases except when a
function signature has borrow
s in list
s or stream
s) and thus can be
stored inline in the native stack frame.
The Subtask.drop
method is only called for Subtask
s that have been
add_to_waitables()
ed and checks that the callee has been allowed to return
its value to the caller.
def drop(self):
trap_if(not self.finished)
assert(self.state == CallState.RETURNED)
self.supertask.num_subtasks -= 1
Waitable.drop(self)
Values of stream
type are represented in the Canonical ABI as i32
indices
into the component instance's waitables
table referring to either the
readable or writable end of a stream. Reading
from the readable end of a stream is achieved by calling stream.read
and
supplying a WritableBuffer
. Conversely, writing to the writable end of a
stream is achieved by calling stream.write
and supplying a ReadableBuffer
.
The runtime waits until both a readable and writable buffer have been supplied
and then performs a direct copy between the two buffers. This rendezvous-based
design avoids the need for an intermediate buffer and copy (unlike, e.g., a
Unix pipe; a Unix pipe would instead be implemented as a resource type owning
the buffer memory and two streams; on going in and one coming out).
As with functions and buffers, native host code can be on either side of a
stream. Thus, streams are defined in terms of an abstract interface that can be
implemented and consumed by wasm or host code (with all {wasm,host} pairings
being possible and well-defined). Since a stream
in a function parameter or
result type always represents the transfer of the readable end of a stream,
the abstract stream interface is ReadableStream
and allows a (wasm or host)
client to asynchronously read multiple values from a (wasm or host) producer.
(The absence of a dual WritableStream
abstract interface reflects the fact
that there is no Component Model type for passing the writable end of a
stream.)
RevokeBuffer = Callable[[], None]
OnPartialCopy = Callable[RevokeBuffer, None]
OnCopyDone = Callable[[], None]
class ReadableStream:
t: ValType
read: Callable[[WritableBuffer, OnPartialCopy, OnCopyDone], Literal['done','blocked']]
cancel: Callable[[], None]
close: Callable[[]]
closed: Callable[[], bool]
closed_with_error: Callable[[], Optional[ErrorContext]]
The key operation is read
which works as follows:
read
is non-blocking, returning'blocked'
if it would have blocked.- The
On*
callbacks are only called afterread
returns'blocked'
. OnCopyDone
is called to indicate that the caller has regained ownership of the buffer.OnPartialCopy
is called to indicate a partial write has been made to the buffer, but there may be further writes made in the future, so the caller has not regained ownership of the buffer.- The
RevokeBuffer
callback passed toOnPartialCopy
allows the caller ofread
to synchronously regain ownership of the buffer. cancel
is also non-blocking, but does not guarantee that ownership of the buffer has been returned;cancel
only lets the caller request thatread
call one of theOn*
callbacks ASAP (which may or may not happen duringcancel
).- The client may not call
read
orclose
while there is still an unfinishedread
of the sameReadableStream
.
The On*
callbacks are a spec-internal detail used to specify the allowed
concurrent behaviors of stream.{read,write}
and not exposed directly to core
wasm code. Specifically, the point of the On*
callbacks is to specify that
multiple writes are allowed into the same WritableBuffer
up until the point
where either the buffer is full (at which point OnCopyDone
is called) or the
calling core wasm code receives the STREAM_READ
progress event (in which case
RevokeBuffer
is called). This reduces the number of task-switches required
by the spec, particularly when streaming between two components.
The ReadableStreamGuestImpl
class implements ReadableStream
for streams
created by wasm (via stream.new
). This class contains both a read
method
that is called through the abstract ReadableStream
interface (by the host or
another component) as well as a write
method that is always called by
stream.write
from the component instance that called stream.new
. write
follows the same exact rules as read
above, except with the WritableBuffer
replaced by a ReadableBuffer
. This allows both read
and write
to be
implemented by a single underlying copy
method parameterized by the direction
of the copy. Given this, the implementation of ReadableStreamGuestImpl
is
derived fairly directly from the abstract rules for listed above:
class ReadableStreamGuestImpl(ReadableStream):
impl: ComponentInstance
closed_: bool
errctx: Optional[ErrorContext]
pending_buffer: Optional[Buffer]
pending_on_partial_copy: Optional[OnPartialCopy]
pending_on_copy_done: Optional[OnCopyDone]
def __init__(self, t, inst):
self.t = t
self.impl = inst
self.closed_ = False
self.errctx = None
self.reset_pending()
def reset_pending(self):
self.pending_buffer = None
self.pending_on_partial_copy = None
self.pending_on_copy_done = None
def read(self, dst, on_partial_copy, on_copy_done):
return self.copy(dst, on_partial_copy, on_copy_done, self.pending_buffer, dst)
def write(self, src, on_partial_copy, on_copy_done):
return self.copy(src, on_partial_copy, on_copy_done, src, self.pending_buffer)
def copy(self, buffer, on_partial_copy, on_copy_done, src, dst):
if self.closed_:
return 'done'
elif not self.pending_buffer:
self.pending_buffer = buffer
self.pending_on_partial_copy = on_partial_copy
self.pending_on_copy_done = on_copy_done
return 'blocked'
else:
ncopy = min(src.remain(), dst.remain())
assert(ncopy > 0)
dst.write(src.read(ncopy))
if self.pending_buffer.remain() > 0:
self.pending_on_partial_copy(self.reset_pending)
else:
self.cancel()
return 'done'
def cancel(self):
pending_on_copy_done = self.pending_on_copy_done
self.reset_pending()
pending_on_copy_done()
def close(self, errctx = None):
if not self.closed_:
self.closed_ = True
self.errctx = errctx
if self.pending_buffer:
self.cancel()
def closed(self):
return self.closed_
def closed_with_error(self):
assert(self.closed_)
return self.errctx
The impl
field records the component instance that created this stream and is
used by lower_stream
below.
The copy
method assumes that the calling Canonical ABI definitions have
already guarded to ensure that there is at most one active read
or
write
call in progress per stream. Thus, when copy
is called and there is
already a pending_buffer
, copy
can simply assume that it has both a
ReadableBuffer
and WritableBuffer
in src
and dst
, resp., and proceed
with the copy.
While the abstract ReadableStream
interface allows cancel
to return
without having returned ownership of the buffer (which, in general, is
necessary for various host APIs), when wasm is
implementing the stream, cancel
always returns ownership of the buffer
immediately by synchronously calling the on_copy_done
callback.
The close
method takes an optional error-context
value which can only be
supplied to the writable end of a stream via stream.close-writable
. Thus, for
the readable end of the stream, close
is effectively nullary, matching
ReadableStream.close
.
Given the above, we can define the {Readable,Writable}StreamEnd
classes that
are actually stored in the waitables
table. The classes are almost entirely
symmetric, with the only difference being that WritableStreamEnd
has an
additional paired
boolean field that is used by {lift,lower}_stream
below
to ensure that there is at most one readable end of a stream. The copying
field tracks whether there is an asynchronous read or write in progress and is
maintained by the definitions of stream.{read,write}
below. Importantly,
paired
, copying
, and the inherited fields of Waitable
are per-end, not
per-stream (unlike the fields of ReadableStreamGuestImpl
shown above, which
are per-stream and shared by both ends via their common stream
field).
class StreamEnd(Waitable):
stream: ReadableStream
copying: bool
def __init__(self, stream):
Waitable.__init__(self)
self.stream = stream
self.copying = False
def drop(self, errctx):
trap_if(self.copying)
self.stream.close(errctx)
Waitable.drop(self)
class ReadableStreamEnd(StreamEnd):
def copy(self, dst, on_partial_copy, on_copy_done):
return self.stream.read(dst, on_partial_copy, on_copy_done)
class WritableStreamEnd(StreamEnd):
paired: bool = False
def copy(self, src, on_partial_copy, on_copy_done):
return self.stream.write(src, on_partial_copy, on_copy_done)
Dropping a stream end from the waitables
table while an asynchronous read or
write is in progress traps since the async read or write cannot be cancelled
without blocking and drop
(called by stream.close-{readable,writable}
) is
synchronous and non-blocking. This means that client code must take care to
wait for these operations to finish before closing.
The {Readable,Writable}StreamEnd.copy
method is called polymorphically by the
shared definition of stream.{read,write}
below. While the static type of
StreamEnd.stream
is ReadableStream
, a WritableStreamEnd
always points to
a ReadableStreamGuestImpl
object which is why WritableStreamEnd.copy
can
unconditionally call stream.write
.
Given the above definitions for stream
, future
can be simply defined as a
stream
that transmits only 1 value before automatically closing itself. This
can be achieved by simply wrapping the on_copy_done
callback (defined above)
and closing once a value has been read-from or written-to the given buffer:
class FutureEnd(StreamEnd):
def close_after_copy(self, copy_op, buffer, on_copy_done):
assert(buffer.remain() == 1)
def on_copy_done_wrapper():
if buffer.remain() == 0:
self.stream.close()
on_copy_done()
ret = copy_op(buffer, on_partial_copy = None, on_copy_done = on_copy_done_wrapper)
if ret == 'done' and buffer.remain() == 0:
self.stream.close()
return ret
class ReadableFutureEnd(FutureEnd):
def copy(self, dst, on_partial_copy, on_copy_done):
return self.close_after_copy(self.stream.read, dst, on_copy_done)
class WritableFutureEnd(FutureEnd):
paired: bool = False
def copy(self, src, on_partial_copy, on_copy_done):
return self.close_after_copy(self.stream.write, src, on_copy_done)
def drop(self, errctx):
trap_if(not self.stream.closed() and not errctx)
FutureEnd.drop(self, errctx)
The future.{read,write}
built-ins fix the buffer length to 1
, ensuring the
assert(buffer.remain() == 1)
holds. Because of this, there are no partial
copies and on_partial_copy
is never called.
The additional trap_if
in WritableFutureEnd.drop
ensures that the only
valid way close a future without writing its value is to close it in error.
In the explainer, component value types are classified as
either fundamental or specialized, where the specialized value types are
defined by expansion into fundamental value types. In most cases, the canonical
ABI of a specialized value type is the same as its expansion so, to avoid
repetition, the other definitions below use the following despecialize
function to replace specialized value types with their expansion:
def despecialize(t):
match t:
case TupleType(ts) : return RecordType([ FieldType(str(i), t) for i,t in enumerate(ts) ])
case EnumType(labels) : return VariantType([ CaseType(l, None) for l in labels ])
case OptionType(t) : return VariantType([ CaseType("none", None), CaseType("some", t) ])
case ResultType(ok, err) : return VariantType([ CaseType("ok", ok), CaseType("error", err) ])
case _ : return t
The specialized value types string
and flags
are missing from this list
because they are given specialized canonical ABI representations distinct from
their respective expansions.
The contains_borrow
and contains_async_value
predicates return whether the
given type contains a borrow
or future/
stream`, respectively.
def contains_borrow(t):
return contains(t, lambda u: isinstance(u, BorrowType))
def contains_async_value(t):
return contains(t, lambda u: isinstance(u, StreamType | FutureType))
def contains(t, p):
t = despecialize(t)
match t:
case None:
return False
case PrimValType() | OwnType() | BorrowType():
return p(t)
case ListType(u) | StreamType(u) | FutureType(u):
return p(t) or contains(u, p)
case RecordType(fields):
return p(t) or any(contains(f.t, p) for f in fields)
case VariantType(cases):
return p(t) or any(contains(c.t, p) for c in cases)
case FuncType():
return any(p(u) for u in t.param_types() + t.result_types())
case _:
assert(False)
Each value type is assigned an alignment which is used by subsequent
Canonical ABI definitions. Presenting the definition of alignment
piecewise,
we start with the top-level case analysis:
def alignment(t):
match despecialize(t):
case BoolType() : return 1
case S8Type() | U8Type() : return 1
case S16Type() | U16Type() : return 2
case S32Type() | U32Type() : return 4
case S64Type() | U64Type() : return 8
case F32Type() : return 4
case F64Type() : return 8
case CharType() : return 4
case StringType() : return 4
case ErrorContextType() : return 4
case ListType(t, l) : return alignment_list(t, l)
case RecordType(fields) : return alignment_record(fields)
case VariantType(cases) : return alignment_variant(cases)
case FlagsType(labels) : return alignment_flags(labels)
case OwnType() | BorrowType() : return 4
case StreamType() | FutureType() : return 4
List alignment is the same as tuple alignment when the length is fixed and otherwise uses the alignment of pointers.
def alignment_list(elem_type, maybe_length):
if maybe_length is not None:
return alignment(elem_type)
return 4
Record alignment is tuple alignment, with the definitions split for reuse below:
def alignment_record(fields):
a = 1
for f in fields:
a = max(a, alignment(f.t))
return a
As an optimization, variant
discriminants are represented by the smallest integer
covering the number of cases in the variant (with cases numbered in order from
0
to len(cases)-1
). Depending on the payload type, this can allow more
compact representations of variants in memory. This smallest integer type is
selected by the following function, used above and below:
def alignment_variant(cases):
return max(alignment(discriminant_type(cases)), max_case_alignment(cases))
def discriminant_type(cases):
n = len(cases)
assert(0 < n < (1 << 32))
match math.ceil(math.log2(n)/8):
case 0: return U8Type()
case 1: return U8Type()
case 2: return U16Type()
case 3: return U32Type()
def max_case_alignment(cases):
a = 1
for c in cases:
if c.t is not None:
a = max(a, alignment(c.t))
return a
As an optimization, flags
are represented as packed bit-vectors. Like variant
discriminants, flags
use the smallest integer that fits all the bits, falling
back to sequences of i32
s when there are more than 32 flags.
def alignment_flags(labels):
n = len(labels)
assert(0 < n <= 32)
if n <= 8: return 1
if n <= 16: return 2
return 4
Each value type is also assigned an elem_size
which is the number of bytes
used when values of the type are stored as elements of a list
. Having this
byte size be a static property of the type instead of attempting to use a
variable-length element-encoding scheme both simplifies the implementation and
maps well to languages which represent list
s as random-access arrays. Empty
types, such as records with no fields, are not permitted, to avoid
complications in source languages.
def elem_size(t):
match despecialize(t):
case BoolType() : return 1
case S8Type() | U8Type() : return 1
case S16Type() | U16Type() : return 2
case S32Type() | U32Type() : return 4
case S64Type() | U64Type() : return 8
case F32Type() : return 4
case F64Type() : return 8
case CharType() : return 4
case StringType() : return 8
case ErrorContextType() : return 4
case ListType(t, l) : return elem_size_list(t, l)
case RecordType(fields) : return elem_size_record(fields)
case VariantType(cases) : return elem_size_variant(cases)
case FlagsType(labels) : return elem_size_flags(labels)
case OwnType() | BorrowType() : return 4
case StreamType() | FutureType() : return 4
def elem_size_list(elem_type, maybe_length):
if maybe_length is not None:
return maybe_length * elem_size(elem_type)
return 8
def elem_size_record(fields):
s = 0
for f in fields:
s = align_to(s, alignment(f.t))
s += elem_size(f.t)
assert(s > 0)
return align_to(s, alignment_record(fields))
def align_to(ptr, alignment):
return math.ceil(ptr / alignment) * alignment
def elem_size_variant(cases):
s = elem_size(discriminant_type(cases))
s = align_to(s, max_case_alignment(cases))
cs = 0
for c in cases:
if c.t is not None:
cs = max(cs, elem_size(c.t))
s += cs
return align_to(s, alignment_variant(cases))
def elem_size_flags(labels):
n = len(labels)
assert(0 < n <= 32)
if n <= 8: return 1
if n <= 16: return 2
return 4
The load
function defines how to read a value of a given value type t
out of linear memory starting at offset ptr
, returning the value represented
as a Python value. Presenting the definition of load
piecewise, we start with
the top-level case analysis:
def load(cx, ptr, t):
assert(ptr == align_to(ptr, alignment(t)))
assert(ptr + elem_size(t) <= len(cx.opts.memory))
match despecialize(t):
case BoolType() : return convert_int_to_bool(load_int(cx, ptr, 1))
case U8Type() : return load_int(cx, ptr, 1)
case U16Type() : return load_int(cx, ptr, 2)
case U32Type() : return load_int(cx, ptr, 4)
case U64Type() : return load_int(cx, ptr, 8)
case S8Type() : return load_int(cx, ptr, 1, signed = True)
case S16Type() : return load_int(cx, ptr, 2, signed = True)
case S32Type() : return load_int(cx, ptr, 4, signed = True)
case S64Type() : return load_int(cx, ptr, 8, signed = True)
case F32Type() : return decode_i32_as_float(load_int(cx, ptr, 4))
case F64Type() : return decode_i64_as_float(load_int(cx, ptr, 8))
case CharType() : return convert_i32_to_char(cx, load_int(cx, ptr, 4))
case StringType() : return load_string(cx, ptr)
case ErrorContextType() : return lift_error_context(cx, load_int(cx, ptr, 4))
case ListType(t, l) : return load_list(cx, ptr, t, l)
case RecordType(fields) : return load_record(cx, ptr, fields)
case VariantType(cases) : return load_variant(cx, ptr, cases)
case FlagsType(labels) : return load_flags(cx, ptr, labels)
case OwnType() : return lift_own(cx, load_int(cx, ptr, 4), t)
case BorrowType() : return lift_borrow(cx, load_int(cx, ptr, 4), t)
case StreamType(t) : return lift_stream(cx, load_int(cx, ptr, 4), t)
case FutureType(t) : return lift_future(cx, load_int(cx, ptr, 4), t)
Integers are loaded directly from memory, with their high-order bit interpreted according to the signedness of the type.
def load_int(cx, ptr, nbytes, signed = False):
return int.from_bytes(cx.opts.memory[ptr : ptr+nbytes], 'little', signed = signed)
Integer-to-boolean conversions treats 0
as false
and all other bit-patterns
as true
:
def convert_int_to_bool(i):
assert(i >= 0)
return bool(i)
Floats are loaded directly from memory, with the sign and payload information of NaN values discarded. Consequently, there is only one unique NaN value per floating-point type. This reflects the practical reality that some languages and protocols do not preserve these bits. In the Python code below, this is expressed as canonicalizing NaNs to a particular bit pattern.
See the comments about lowering of float values for a discussion of possible optimizations.
DETERMINISTIC_PROFILE = False # or True
CANONICAL_FLOAT32_NAN = 0x7fc00000
CANONICAL_FLOAT64_NAN = 0x7ff8000000000000
def canonicalize_nan32(f):
if math.isnan(f):
f = core_f32_reinterpret_i32(CANONICAL_FLOAT32_NAN)
assert(math.isnan(f))
return f
def canonicalize_nan64(f):
if math.isnan(f):
f = core_f64_reinterpret_i64(CANONICAL_FLOAT64_NAN)
assert(math.isnan(f))
return f
def decode_i32_as_float(i):
return canonicalize_nan32(core_f32_reinterpret_i32(i))
def decode_i64_as_float(i):
return canonicalize_nan64(core_f64_reinterpret_i64(i))
def core_f32_reinterpret_i32(i):
return struct.unpack('<f', struct.pack('<I', i))[0] # f32.reinterpret_i32
def core_f64_reinterpret_i64(i):
return struct.unpack('<d', struct.pack('<Q', i))[0] # f64.reinterpret_i64
An i32
is converted to a char
(a Unicode Scalar Value) by dynamically
testing that its unsigned integral value is in the valid Unicode Code Point
range and not a Surrogate:
def convert_i32_to_char(cx, i):
assert(i >= 0)
trap_if(i >= 0x110000)
trap_if(0xD800 <= i <= 0xDFFF)
return chr(i)
Strings are loaded from two i32
values: a pointer (offset in linear memory)
and a number of bytes. There are three supported string encodings in canonopt
:
UTF-8, UTF-16 and latin1+utf16
. This last options allows a dynamic
choice between Latin-1 and UTF-16, indicated by the high bit of the second
i32
. String values include their original encoding and byte length as a
"hint" that enables store_string
(defined below) to make better up-front
allocation size choices in many cases. Thus, the value produced by
load_string
isn't simply a Python str
, but a tuple containing a str
,
the original encoding and the original byte length.
String = tuple[str, str, int]
def load_string(cx, ptr) -> String:
begin = load_int(cx, ptr, 4)
tagged_code_units = load_int(cx, ptr + 4, 4)
return load_string_from_range(cx, begin, tagged_code_units)
UTF16_TAG = 1 << 31
def load_string_from_range(cx, ptr, tagged_code_units) -> String:
match cx.opts.string_encoding:
case 'utf8':
alignment = 1
byte_length = tagged_code_units
encoding = 'utf-8'
case 'utf16':
alignment = 2
byte_length = 2 * tagged_code_units
encoding = 'utf-16-le'
case 'latin1+utf16':
alignment = 2
if bool(tagged_code_units & UTF16_TAG):
byte_length = 2 * (tagged_code_units ^ UTF16_TAG)
encoding = 'utf-16-le'
else:
byte_length = tagged_code_units
encoding = 'latin-1'
trap_if(ptr != align_to(ptr, alignment))
trap_if(ptr + byte_length > len(cx.opts.memory))
try:
s = cx.opts.memory[ptr : ptr+byte_length].decode(encoding)
except UnicodeError:
trap()
return (s, cx.opts.string_encoding, tagged_code_units)
Error context values are lifted directly from the per-component-instance
error_contexts
table:
def lift_error_context(cx, i):
return cx.inst.error_contexts.get(i)
Lists and records are loaded by recursively loading their elements/fields:
def load_list(cx, ptr, elem_type, maybe_length):
if maybe_length is not None:
return load_list_from_valid_range(cx, ptr, maybe_length, elem_type)
begin = load_int(cx, ptr, 4)
length = load_int(cx, ptr + 4, 4)
return load_list_from_range(cx, begin, length, elem_type)
def load_list_from_range(cx, ptr, length, elem_type):
trap_if(ptr != align_to(ptr, alignment(elem_type)))
trap_if(ptr + length * elem_size(elem_type) > len(cx.opts.memory))
return load_list_from_valid_range(cx, ptr, length, elem_type)
def load_list_from_valid_range(cx, ptr, length, elem_type):
a = []
for i in range(length):
a.append(load(cx, ptr + i * elem_size(elem_type), elem_type))
return a
def load_record(cx, ptr, fields):
record = {}
for field in fields:
ptr = align_to(ptr, alignment(field.t))
record[field.label] = load(cx, ptr, field.t)
ptr += elem_size(field.t)
return record
As a technical detail: the align_to
in the loop in load_record
is
guaranteed to be a no-op on the first iteration because the record as
a whole starts out aligned (as asserted at the top of load
).
Variants are loaded using the order of the cases in the type to determine the
case index, assigning 0
to the first case, 1
to the next case, etc.
While the code below appears to perform case-label lookup at runtime, a normal
implementation can build the appropriate index tables at compile-time so that
variant-passing is always O(1) and not involving string operations.
def load_variant(cx, ptr, cases):
disc_size = elem_size(discriminant_type(cases))
case_index = load_int(cx, ptr, disc_size)
ptr += disc_size
trap_if(case_index >= len(cases))
c = cases[case_index]
ptr = align_to(ptr, max_case_alignment(cases))
if c.t is None:
return { c.label: None }
return { c.label: load(cx, ptr, c.t) }
Flags are converted from a bit-vector to a dictionary whose keys are
derived from the ordered labels of the flags
type. The code here takes
advantage of Python's support for integers of arbitrary width.
def load_flags(cx, ptr, labels):
i = load_int(cx, ptr, elem_size_flags(labels))
return unpack_flags_from_int(i, labels)
def unpack_flags_from_int(i, labels):
record = {}
for l in labels:
record[l] = bool(i & 1)
i >>= 1
return record
own
handles are lifted by removing the handle from the current component
instance's handle table, so that ownership is transferred to the lowering
component. The lifting operation fails if unique ownership of the handle isn't
possible, for example if the index was actually a borrow
or if the own
handle is currently being lent out as borrows.
def lift_own(cx, i, t):
h = cx.inst.resources.remove(i)
trap_if(h.rt is not t.rt)
trap_if(h.num_lends != 0)
trap_if(not h.own)
return h.rep
The abstract lifted value for handle types is currently just the internal
resource representation i32
, which is kept opaque from the receiving
component (it's stored in the handle table and only accessed indirectly via
index). (This assumes that resource representations are immutable. If
representations were to become mutable, the address of the mutable cell would
be passed as the lifted value instead.)
In contrast to own
, borrow
handles are lifted by reading the representation
from the source handle, leaving the source handle intact in the current
component instance's handle table:
def lift_borrow(cx, i, t):
assert(isinstance(cx.borrow_scope, Subtask))
h = cx.inst.resources.get(i)
trap_if(h.rt is not t.rt)
cx.borrow_scope.add_lender(h)
return h.rep
The Subtask.add_lender
participates in the enforcement of the dynamic borrow
rules, which keep the source handle alive until the end of the call (as a
conservative upper bound on how long the borrow
handle can be held). Note
that add_lender
is called for borrowed source handles so that they must be
kept alive until the subtask completes, which in turn prevents the current task
from task.return
ing while its non-returned subtask still holds a
transitively-borrowed handle.
Streams and futures are lifted in almost the same way, with the only difference
being that it is a dynamic error to attempt to lift a future
that has already
been successfully read (which will leave it closed()
):
def lift_stream(cx, i, t):
return lift_async_value(ReadableStreamEnd, WritableStreamEnd, cx, i, t)
def lift_future(cx, i, t):
v = lift_async_value(ReadableFutureEnd, WritableFutureEnd, cx, i, t)
trap_if(v.closed())
return v
def lift_async_value(ReadableEndT, WritableEndT, cx, i, t):
assert(not contains_borrow(t))
e = cx.inst.waitables.get(i)
match e:
case ReadableEndT():
trap_if(e.copying)
cx.inst.waitables.remove(i)
case WritableEndT():
trap_if(e.paired)
e.paired = True
case _:
trap()
trap_if(e.stream.t != t)
return e.stream
Since the waitables
table is heterogeneous, dynamic checks are
necessary to ensure that i
index actually refers to stream/future end
of the correct type.
Lifting the readable end (of a stream or future) transfers ownership of the readable end and traps if a read was in progress (which would now be dangling).
Lifting the writable end of a stream leaves the writable end in place (allowing
there to be a write in progress) and shares the writable end's contained
ReadableStreamGuestImpl
object with the readable end created by
lower_stream
, pairing the two ends together. As an invariant, each stream can
have at most one readable and writable end, so the paired
field of
Writable{Stream,Future}End
is used to track whether there is already a
readable end and trapping if another would be created.
The store
function defines how to write a value v
of a given value type
t
into linear memory starting at offset ptr
. Presenting the definition of
store
piecewise, we start with the top-level case analysis:
def store(cx, v, t, ptr):
assert(ptr == align_to(ptr, alignment(t)))
assert(ptr + elem_size(t) <= len(cx.opts.memory))
match despecialize(t):
case BoolType() : store_int(cx, int(bool(v)), ptr, 1)
case U8Type() : store_int(cx, v, ptr, 1)
case U16Type() : store_int(cx, v, ptr, 2)
case U32Type() : store_int(cx, v, ptr, 4)
case U64Type() : store_int(cx, v, ptr, 8)
case S8Type() : store_int(cx, v, ptr, 1, signed = True)
case S16Type() : store_int(cx, v, ptr, 2, signed = True)
case S32Type() : store_int(cx, v, ptr, 4, signed = True)
case S64Type() : store_int(cx, v, ptr, 8, signed = True)
case F32Type() : store_int(cx, encode_float_as_i32(v), ptr, 4)
case F64Type() : store_int(cx, encode_float_as_i64(v), ptr, 8)
case CharType() : store_int(cx, char_to_i32(v), ptr, 4)
case StringType() : store_string(cx, v, ptr)
case ErrorContextType() : store_int(cx, lower_error_context(cx, v), ptr, 4)
case ListType(t, l) : store_list(cx, v, ptr, t, l)
case RecordType(fields) : store_record(cx, v, ptr, fields)
case VariantType(cases) : store_variant(cx, v, ptr, cases)
case FlagsType(labels) : store_flags(cx, v, ptr, labels)
case OwnType() : store_int(cx, lower_own(cx, v, t), ptr, 4)
case BorrowType() : store_int(cx, lower_borrow(cx, v, t), ptr, 4)
case StreamType(t) : store_int(cx, lower_stream(cx, v, t), ptr, 4)
case FutureType(t) : store_int(cx, lower_future(cx, v, t), ptr, 4)
Integers are stored directly into memory. Because the input domain is exactly
the integers in range for the given type, no extra range checks are necessary;
the signed
parameter is only present to ensure that the internal range checks
of int.to_bytes
are satisfied.
def store_int(cx, v, ptr, nbytes, signed = False):
cx.opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed = signed)
Floats are stored directly into memory, with the sign and payload bits of NaN values modified non-deterministically. This reflects the practical reality that different languages, protocols and CPUs have different effects on NaNs.
Although this non-determinism is expressed in the Python code below as generating a "random" NaN bit-pattern, native implementations do not need to use the same "random" algorithm, or even any random algorithm at all. Hosts may instead chose to canonicalize to an arbitrary fixed NaN value, or even to the original value of the NaN before lifting, allowing them to optimize away both the canonicalization of lifting and the randomization of lowering.
When a host implements the deterministic profile, NaNs are canonicalized to a particular NaN bit-pattern.
def maybe_scramble_nan32(f):
if math.isnan(f):
if DETERMINISTIC_PROFILE:
f = core_f32_reinterpret_i32(CANONICAL_FLOAT32_NAN)
else:
f = core_f32_reinterpret_i32(random_nan_bits(32, 8))
assert(math.isnan(f))
return f
def maybe_scramble_nan64(f):
if math.isnan(f):
if DETERMINISTIC_PROFILE:
f = core_f64_reinterpret_i64(CANONICAL_FLOAT64_NAN)
else:
f = core_f64_reinterpret_i64(random_nan_bits(64, 11))
assert(math.isnan(f))
return f
def random_nan_bits(total_bits, exponent_bits):
fraction_bits = total_bits - exponent_bits - 1
bits = random.getrandbits(total_bits)
bits |= ((1 << exponent_bits) - 1) << fraction_bits
bits |= 1 << random.randrange(fraction_bits - 1)
return bits
def encode_float_as_i32(f):
return core_i32_reinterpret_f32(maybe_scramble_nan32(f))
def encode_float_as_i64(f):
return core_i64_reinterpret_f64(maybe_scramble_nan64(f))
def core_i32_reinterpret_f32(f):
return struct.unpack('<I', struct.pack('<f', f))[0] # i32.reinterpret_f32
def core_i64_reinterpret_f64(f):
return struct.unpack('<Q', struct.pack('<d', f))[0] # i64.reinterpret_f64
The integral value of a char
(a Unicode Scalar Value) is a valid unsigned
i32
and thus no runtime conversion or checking is necessary:
def char_to_i32(c):
i = ord(c)
assert(0 <= i <= 0xD7FF or 0xD800 <= i <= 0x10FFFF)
return i
Storing strings is complicated by the goal of attempting to optimize the
different transcoding cases. In particular, one challenge is choosing the
linear memory allocation size before examining the contents of the string.
The reason for this constraint is that, in some settings where single-pass
iterators are involved (host calls and post-MVP adapter functions), examining
the contents of a string more than once would require making an engine-internal
temporary copy of the whole string, which the component model specifically aims
not to do. To avoid multiple passes, the canonical ABI instead uses a realloc
approach to update the allocation size during the single copy. A blind
realloc
approach would normally suffer from multiple reallocations per string
(e.g., using the standard doubling-growth strategy). However, as already shown
in load_string
above, string values come with two useful hints: their
original encoding and byte length. From this hint data, store_string
can do a
much better job minimizing the number of reallocations.
We start with a case analysis to enumerate all the meaningful encoding
combinations, subdividing the latin1+utf16
encoding into either latin1
or
utf16
based on the UTF16_BIT
flag set by load_string
:
def store_string(cx, v: String, ptr):
begin, tagged_code_units = store_string_into_range(cx, v)
store_int(cx, begin, ptr, 4)
store_int(cx, tagged_code_units, ptr + 4, 4)
def store_string_into_range(cx, v: String):
src, src_encoding, src_tagged_code_units = v
if src_encoding == 'latin1+utf16':
if bool(src_tagged_code_units & UTF16_TAG):
src_simple_encoding = 'utf16'
src_code_units = src_tagged_code_units ^ UTF16_TAG
else:
src_simple_encoding = 'latin1'
src_code_units = src_tagged_code_units
else:
src_simple_encoding = src_encoding
src_code_units = src_tagged_code_units
match cx.opts.string_encoding:
case 'utf8':
match src_simple_encoding:
case 'utf8' : return store_string_copy(cx, src, src_code_units, 1, 1, 'utf-8')
case 'utf16' : return store_utf16_to_utf8(cx, src, src_code_units)
case 'latin1' : return store_latin1_to_utf8(cx, src, src_code_units)
case 'utf16':
match src_simple_encoding:
case 'utf8' : return store_utf8_to_utf16(cx, src, src_code_units)
case 'utf16' : return store_string_copy(cx, src, src_code_units, 2, 2, 'utf-16-le')
case 'latin1' : return store_string_copy(cx, src, src_code_units, 2, 2, 'utf-16-le')
case 'latin1+utf16':
match src_encoding:
case 'utf8' : return store_string_to_latin1_or_utf16(cx, src, src_code_units)
case 'utf16' : return store_string_to_latin1_or_utf16(cx, src, src_code_units)
case 'latin1+utf16' :
match src_simple_encoding:
case 'latin1' : return store_string_copy(cx, src, src_code_units, 1, 2, 'latin-1')
case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(cx, src, src_code_units)
The simplest 4 cases above can compute the exact destination size and then copy with a simply loop (that possibly inflates Latin-1 to UTF-16 by injecting a 0 byte after every Latin-1 byte).
MAX_STRING_BYTE_LENGTH = (1 << 31) - 1
def store_string_copy(cx, src, src_code_units, dst_code_unit_size, dst_alignment, dst_encoding):
dst_byte_length = dst_code_unit_size * src_code_units
trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH)
ptr = cx.opts.realloc(0, 0, dst_alignment, dst_byte_length)
trap_if(ptr != align_to(ptr, dst_alignment))
trap_if(ptr + dst_byte_length > len(cx.opts.memory))
encoded = src.encode(dst_encoding)
assert(dst_byte_length == len(encoded))
cx.opts.memory[ptr : ptr+len(encoded)] = encoded
return (ptr, src_code_units)
The choice of MAX_STRING_BYTE_LENGTH
constant ensures that the high bit of a
string's byte length is never set, keeping it clear for UTF16_BIT
.
The 2 cases of transcoding into UTF-8 share an algorithm that starts by optimistically assuming that each code unit of the source string fits in a single UTF-8 byte and then, failing that, reallocates to a worst-case size, finishes the copy, and then finishes with a shrinking reallocation.
def store_utf16_to_utf8(cx, src, src_code_units):
worst_case_size = src_code_units * 3
return store_string_to_utf8(cx, src, src_code_units, worst_case_size)
def store_latin1_to_utf8(cx, src, src_code_units):
worst_case_size = src_code_units * 2
return store_string_to_utf8(cx, src, src_code_units, worst_case_size)
def store_string_to_utf8(cx, src, src_code_units, worst_case_size):
assert(src_code_units <= MAX_STRING_BYTE_LENGTH)
ptr = cx.opts.realloc(0, 0, 1, src_code_units)
trap_if(ptr + src_code_units > len(cx.opts.memory))
for i,code_point in enumerate(src):
if ord(code_point) < 2**7:
cx.opts.memory[ptr + i] = ord(code_point)
else:
trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
ptr = cx.opts.realloc(ptr, src_code_units, 1, worst_case_size)
trap_if(ptr + worst_case_size > len(cx.opts.memory))
encoded = src.encode('utf-8')
cx.opts.memory[ptr+i : ptr+len(encoded)] = encoded[i : ]
if worst_case_size > len(encoded):
ptr = cx.opts.realloc(ptr, worst_case_size, 1, len(encoded))
trap_if(ptr + len(encoded) > len(cx.opts.memory))
return (ptr, len(encoded))
return (ptr, src_code_units)
Converting from UTF-8 to UTF-16 performs an initial worst-case size allocation (assuming each UTF-8 byte encodes a whole code point that inflates into a two-byte UTF-16 code unit) and then does a shrinking reallocation at the end if multiple UTF-8 bytes were collapsed into a single 2-byte UTF-16 code unit:
def store_utf8_to_utf16(cx, src, src_code_units):
worst_case_size = 2 * src_code_units
trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
ptr = cx.opts.realloc(0, 0, 2, worst_case_size)
trap_if(ptr != align_to(ptr, 2))
trap_if(ptr + worst_case_size > len(cx.opts.memory))
encoded = src.encode('utf-16-le')
cx.opts.memory[ptr : ptr+len(encoded)] = encoded
if len(encoded) < worst_case_size:
ptr = cx.opts.realloc(ptr, worst_case_size, 2, len(encoded))
trap_if(ptr != align_to(ptr, 2))
trap_if(ptr + len(encoded) > len(cx.opts.memory))
code_units = int(len(encoded) / 2)
return (ptr, code_units)
The next transcoding case handles latin1+utf16
encoding, where there general
goal is to fit the incoming string into Latin-1 if possible based on the code
points of the incoming string. The algorithm speculates that all code points
do fit into Latin-1 and then falls back to a worst-case allocation size when
a code point is found outside Latin-1. In this fallback case, the
previously-copied Latin-1 bytes are inflated in place, inserting a 0 byte
after every Latin-1 byte (iterating in reverse to avoid clobbering later
bytes):
def store_string_to_latin1_or_utf16(cx, src, src_code_units):
assert(src_code_units <= MAX_STRING_BYTE_LENGTH)
ptr = cx.opts.realloc(0, 0, 2, src_code_units)
trap_if(ptr != align_to(ptr, 2))
trap_if(ptr + src_code_units > len(cx.opts.memory))
dst_byte_length = 0
for usv in src:
if ord(usv) < (1 << 8):
cx.opts.memory[ptr + dst_byte_length] = ord(usv)
dst_byte_length += 1
else:
worst_case_size = 2 * src_code_units
trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
ptr = cx.opts.realloc(ptr, src_code_units, 2, worst_case_size)
trap_if(ptr != align_to(ptr, 2))
trap_if(ptr + worst_case_size > len(cx.opts.memory))
for j in range(dst_byte_length-1, -1, -1):
cx.opts.memory[ptr + 2*j] = cx.opts.memory[ptr + j]
cx.opts.memory[ptr + 2*j + 1] = 0
encoded = src.encode('utf-16-le')
cx.opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ]
if worst_case_size > len(encoded):
ptr = cx.opts.realloc(ptr, worst_case_size, 2, len(encoded))
trap_if(ptr != align_to(ptr, 2))
trap_if(ptr + len(encoded) > len(cx.opts.memory))
tagged_code_units = int(len(encoded) / 2) | UTF16_TAG
return (ptr, tagged_code_units)
if dst_byte_length < src_code_units:
ptr = cx.opts.realloc(ptr, src_code_units, 2, dst_byte_length)
trap_if(ptr != align_to(ptr, 2))
trap_if(ptr + dst_byte_length > len(cx.opts.memory))
return (ptr, dst_byte_length)
The final transcoding case takes advantage of the extra heuristic
information that the incoming UTF-16 bytes were intentionally chosen over
Latin-1 by the producer, indicating that they probably contain code points
outside Latin-1 and thus probably require inflation. Based on this
information, the transcoding algorithm pessimistically allocates storage for
UTF-16, deflating at the end if indeed no non-Latin-1 code points were
encountered. This Latin-1 deflation ensures that if a group of components
are all using latin1+utf16
and one component over-uses UTF-16, other
components can recover the Latin-1 compression. (The Latin-1 check can be
inexpensively fused with the UTF-16 validate+copy loop.)
def store_probably_utf16_to_latin1_or_utf16(cx, src, src_code_units):
src_byte_length = 2 * src_code_units
trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH)
ptr = cx.opts.realloc(0, 0, 2, src_byte_length)
trap_if(ptr != align_to(ptr, 2))
trap_if(ptr + src_byte_length > len(cx.opts.memory))
encoded = src.encode('utf-16-le')
cx.opts.memory[ptr : ptr+len(encoded)] = encoded
if any(ord(c) >= (1 << 8) for c in src):
tagged_code_units = int(len(encoded) / 2) | UTF16_TAG
return (ptr, tagged_code_units)
latin1_size = int(len(encoded) / 2)
for i in range(latin1_size):
cx.opts.memory[ptr + i] = cx.opts.memory[ptr + 2*i]
ptr = cx.opts.realloc(ptr, src_byte_length, 1, latin1_size)
trap_if(ptr + latin1_size > len(cx.opts.memory))
return (ptr, latin1_size)
Error context values are lowered by storing them directly into the
per-component-instance error_contexts
table and passing the i32
index to
wasm.
def lower_error_context(cx, v):
return cx.inst.error_contexts.add(v)
Lists and records are stored by recursively storing their elements and are symmetric to the loading functions. Unlike strings, lists can simply allocate based on the up-front knowledge of length and static element size.
def store_list(cx, v, ptr, elem_type, maybe_length):
if maybe_length is not None:
assert(maybe_length == len(v))
store_list_into_valid_range(cx, v, ptr, elem_type)
return
begin, length = store_list_into_range(cx, v, elem_type)
store_int(cx, begin, ptr, 4)
store_int(cx, length, ptr + 4, 4)
def store_list_into_range(cx, v, elem_type):
byte_length = len(v) * elem_size(elem_type)
trap_if(byte_length >= (1 << 32))
ptr = cx.opts.realloc(0, 0, alignment(elem_type), byte_length)
trap_if(ptr != align_to(ptr, alignment(elem_type)))
trap_if(ptr + byte_length > len(cx.opts.memory))
store_list_into_valid_range(cx, v, ptr, elem_type)
return (ptr, len(v))
def store_list_into_valid_range(cx, v, ptr, elem_type):
for i,e in enumerate(v):
store(cx, e, elem_type, ptr + i * elem_size(elem_type))
def store_record(cx, v, ptr, fields):
for f in fields:
ptr = align_to(ptr, alignment(f.t))
store(cx, v[f.label], f.t, ptr)
ptr += elem_size(f.t)
Variant values are represented as Python dictionaries containing exactly one
entry whose key is the label of the lifted case and whose value is the
(optional) case payload. While this code appears to do an O(n) search of the
variant
type for a matching case label, a normal implementation can
statically fuse store_variant
with its matching load_variant
to ultimately
build a dense array that maps producer's case indices to the consumer's case
indices.
def store_variant(cx, v, ptr, cases):
case_index, case_value = match_case(v, cases)
disc_size = elem_size(discriminant_type(cases))
store_int(cx, case_index, ptr, disc_size)
ptr += disc_size
ptr = align_to(ptr, max_case_alignment(cases))
c = cases[case_index]
if c.t is not None:
store(cx, case_value, c.t, ptr)
def match_case(v, cases):
[label] = v.keys()
[index] = [i for i,c in enumerate(cases) if c.label == label]
[value] = v.values()
return (index, value)
Flags are converted from a dictionary to a bit-vector by iterating through the case-labels of the variant in the order they were listed in the type definition and OR-ing all the bits together. Flag lifting/lowering can be statically fused into array/integer operations (with a simple byte copy when the case lists are the same) to avoid any string operations in a similar manner to variants.
def store_flags(cx, v, ptr, labels):
i = pack_flags_into_int(v, labels)
store_int(cx, i, ptr, elem_size_flags(labels))
def pack_flags_into_int(v, labels):
i = 0
shift = 0
for l in labels:
i |= (int(bool(v[l])) << shift)
shift += 1
return i
Finally, own
and borrow
handles are lowered by initializing new handle
elements in the current component instance's resources
table. The increment
of num_borrows
is complemented by a decrement in canon_resource_drop
and
ensures that all borrowed handles are dropped before the end of the task.
def lower_own(cx, rep, t):
h = ResourceHandle(t.rt, rep, own = True)
return cx.inst.resources.add(h)
def lower_borrow(cx, rep, t):
assert(isinstance(cx.borrow_scope, Task))
if cx.inst is t.rt.impl:
return rep
h = ResourceHandle(t.rt, rep, own = False, borrow_scope = cx.borrow_scope)
h.borrow_scope.num_borrows += 1
return cx.inst.resources.add(h)
The special case in lower_borrow
is an optimization, recognizing that, when
a borrowed handle is passed to the component that implemented the resource
type, the only thing the borrowed handle is good for is calling
resource.rep
, so lowering might as well avoid the overhead of creating an
intermediate borrow handle.
Lowering a stream
or future
is almost entirely symmetric. The
trap_if(v.closed())
in lift_future
ensures the validity of the
assert(not v.closed())
in lower_future
.
def lower_stream(cx, v, t):
return lower_async_value(ReadableStreamEnd, WritableStreamEnd, cx, v, t)
def lower_future(cx, v, t):
assert(not v.closed())
return lower_async_value(ReadableFutureEnd, WritableFutureEnd, cx, v, t)
def lower_async_value(ReadableEndT, WritableEndT, cx, v, t):
assert(isinstance(v, ReadableStream))
assert(not contains_borrow(t))
if isinstance(v, ReadableStreamGuestImpl) and cx.inst is v.impl:
for i,e in enumerate(cx.inst.waitables.array):
if isinstance(e, WritableEndT) and e.stream is v:
break
assert(e.paired)
e.paired = False
assert(i <= Table.MAX_LENGTH < 2**31)
return i | (2**31)
else:
return cx.inst.waitables.add(ReadableEndT(v))
In the ordinary (else
) case, the ReadableStream
value (which may be
implemented by the host or wasm) is stored in a new readable end in the
waitables
table.
The interesting case is when a component instance receives a ReadableStream
for which the component instance already holds writable end. Without specially
handling this case, this would lead to copies from a single linear memory into
itself which is both inefficient and raises subtle semantic interleaving
questions that we would rather avoid. To avoid both, this case is detected and
the index of the existing writable end is returned instead of a new readable
end, setting the high bit to signal this fact to guest code. Guest code must
therefore handle this special case by collapsing the two ends of the stream to
work fully within guest code (since the Canonical ABI is now wholly unnecessary
to pass values from writer to reader). The O(N) search through the waitables
table is expected to be optimized away by instead storing a pointer or index
of the writable end in the stream itself (alongside the impl
field).
With only the definitions above, the Canonical ABI would be forced to place all
parameters and results in linear memory. While this is necessary in the general
case, in many cases performance can be improved by passing small-enough values
in registers by using core function parameters and results. To support this
optimization, the Canonical ABI defines flatten_functype
to map component
function types to core function types by attempting to decompose all the
non-dynamically-sized component value types into core value types.
For a variety of practical reasons, we need to limit the total number of flattened parameters and results, falling back to storing everything in linear memory. The number of flattened results is currently limited to 1 due to various parts of the toolchain (notably the C ABI) not yet being able to express multi-value returns. Hopefully this limitation is temporary and can be lifted before the Component Model is fully standardized.
When there are too many flat values, in general, a single i32
pointer can be
passed instead (pointing to a tuple in linear memory). When lowering into
linear memory, this requires the Canonical ABI to call realloc
(in lower
below) to allocate space to put the tuple. As an optimization, when lowering
the return value of an imported function (via canon lower
), the caller can
have already allocated space for the return value (e.g., efficiently on the
stack), passing in an i32
pointer as an parameter instead of returning an
i32
as a return value.
Given all this, the top-level definition of flatten_functype
is:
MAX_FLAT_PARAMS = 16
MAX_FLAT_RESULTS = 1
def flatten_functype(opts, ft, context):
flat_params = flatten_types(ft.param_types())
flat_results = flatten_types(ft.result_types())
if opts.sync:
if len(flat_params) > MAX_FLAT_PARAMS:
flat_params = ['i32']
if len(flat_results) > MAX_FLAT_RESULTS:
match context:
case 'lift':
flat_results = ['i32']
case 'lower':
flat_params += ['i32']
flat_results = []
return CoreFuncType(flat_params, flat_results)
else:
match context:
case 'lift':
if opts.callback:
flat_results = ['i32']
else:
flat_results = []
case 'lower':
if len(flat_params) > 1:
flat_params = ['i32']
if len(flat_results) > 0:
flat_params += ['i32']
flat_results = ['i32']
return CoreFuncType(flat_params, flat_results)
def flatten_types(ts):
return [ft for t in ts for ft in flatten_type(t)]
As shown here, the core signatures async
functions use a lower limit on the
maximum number of parameters (1) and results (0) passed as scalars before
falling back to passing through memory.
Presenting the definition of flatten_type
piecewise, we start with the
top-level case analysis:
def flatten_type(t):
match despecialize(t):
case BoolType() : return ['i32']
case U8Type() | U16Type() | U32Type() : return ['i32']
case S8Type() | S16Type() | S32Type() : return ['i32']
case S64Type() | U64Type() : return ['i64']
case F32Type() : return ['f32']
case F64Type() : return ['f64']
case CharType() : return ['i32']
case StringType() : return ['i32', 'i32']
case ErrorContextType() : return ['i32']
case ListType(t, l) : return flatten_list(t, l)
case RecordType(fields) : return flatten_record(fields)
case VariantType(cases) : return flatten_variant(cases)
case FlagsType(labels) : return ['i32']
case OwnType() | BorrowType() : return ['i32']
case StreamType() | FutureType() : return ['i32']
List flattening of a fixed-length list uses the same flattening as a tuple
(via flatten_record
below).
def flatten_list(elem_type, maybe_length):
if maybe_length is not None:
return flatten_type(elem_type) * maybe_length
return ['i32', 'i32']
Record flattening simply flattens each field in sequence.
def flatten_record(fields):
flat = []
for f in fields:
flat += flatten_type(f.t)
return flat
Variant flattening is more involved due to the fact that each case payload can
have a totally different flattening. Rather than giving up when there is a type
mismatch, the Canonical ABI relies on the fact that the 4 core value types can
be easily bit-cast between each other and defines a join
operator to pick the
tightest approximation. What this means is that, regardless of the dynamic
case, all flattened variants are passed with the same static set of core types,
which may involve, e.g., reinterpreting an f32
as an i32
or zero-extending
an i32
into an i64
.
def flatten_variant(cases):
flat = []
for c in cases:
if c.t is not None:
for i,ft in enumerate(flatten_type(c.t)):
if i < len(flat):
flat[i] = join(flat[i], ft)
else:
flat.append(ft)
return flatten_type(discriminant_type(cases)) + flat
def join(a, b):
if a == b: return a
if (a == 'i32' and b == 'f32') or (a == 'f32' and b == 'i32'): return 'i32'
return 'i64'
Values are lifted by iterating over a list of parameter or result Core WebAssembly values:
class CoreValueIter:
values: list[int|float]
i: int
def __init__(self, vs):
self.values = vs
self.i = 0
def next(self, t):
v = self.values[self.i]
self.i += 1
match t:
case 'i32': assert(isinstance(v, int) and 0 <= v < 2**32)
case 'i64': assert(isinstance(v, int) and 0 <= v < 2**64)
case 'f32': assert(isinstance(v, (int,float)))
case 'f64': assert(isinstance(v, (int,float)))
case _ : assert(False)
return v
def done(self):
return self.i == len(self.values)
The match
is only used for spec-level assertions; no runtime typecase is
required.
The lift_flat
function defines how to convert a list of core values into a
single high-level value of type t
. Presenting the definition of lift_flat
piecewise, we start with the top-level case analysis:
def lift_flat(cx, vi, t):
match despecialize(t):
case BoolType() : return convert_int_to_bool(vi.next('i32'))
case U8Type() : return lift_flat_unsigned(vi, 32, 8)
case U16Type() : return lift_flat_unsigned(vi, 32, 16)
case U32Type() : return lift_flat_unsigned(vi, 32, 32)
case U64Type() : return lift_flat_unsigned(vi, 64, 64)
case S8Type() : return lift_flat_signed(vi, 32, 8)
case S16Type() : return lift_flat_signed(vi, 32, 16)
case S32Type() : return lift_flat_signed(vi, 32, 32)
case S64Type() : return lift_flat_signed(vi, 64, 64)
case F32Type() : return canonicalize_nan32(vi.next('f32'))
case F64Type() : return canonicalize_nan64(vi.next('f64'))
case CharType() : return convert_i32_to_char(cx, vi.next('i32'))
case StringType() : return lift_flat_string(cx, vi)
case ErrorContextType() : return lift_error_context(cx, vi.next('i32'))
case ListType(t, l) : return lift_flat_list(cx, vi, t, l)
case RecordType(fields) : return lift_flat_record(cx, vi, fields)
case VariantType(cases) : return lift_flat_variant(cx, vi, cases)
case FlagsType(labels) : return lift_flat_flags(vi, labels)
case OwnType() : return lift_own(cx, vi.next('i32'), t)
case BorrowType() : return lift_borrow(cx, vi.next('i32'), t)
case StreamType(t) : return lift_stream(cx, vi.next('i32'), t)
case FutureType(t) : return lift_future(cx, vi.next('i32'), t)
Integers are lifted from core i32
or i64
values using the signedness of the
target type to interpret the high-order bit. When the target type is narrower
than an i32
, the Canonical ABI ignores the unused high bits (like load_int
).
The conversion logic here assumes that i32
values are always represented as
unsigned Python int
s and thus lifting to a signed type performs a manual 2s
complement conversion in the Python (which would be a no-op in hardware).
def lift_flat_unsigned(vi, core_width, t_width):
i = vi.next('i' + str(core_width))
assert(0 <= i < (1 << core_width))
return i % (1 << t_width)
def lift_flat_signed(vi, core_width, t_width):
i = vi.next('i' + str(core_width))
assert(0 <= i < (1 << core_width))
i %= (1 << t_width)
if i >= (1 << (t_width - 1)):
return i - (1 << t_width)
return i
The contents of strings and variable-length lists are stored in memory so
lifting these types is essentially the same as loading them from memory; the
only difference is that the pointer and length come from i32
values instead
of from linear memory. Fixed-length lists are lifted the same way as a
tuple (via lift_flat_record
below).
def lift_flat_string(cx, vi):
ptr = vi.next('i32')
packed_length = vi.next('i32')
return load_string_from_range(cx, ptr, packed_length)
def lift_flat_list(cx, vi, elem_type, maybe_length):
if maybe_length is not None:
a = []
for i in range(maybe_length):
a.append(lift_flat(cx, vi, elem_type))
return a
ptr = vi.next('i32')
length = vi.next('i32')
return load_list_from_range(cx, ptr, length, elem_type)
Records are lifted by recursively lifting their fields:
def lift_flat_record(cx, vi, fields):
record = {}
for f in fields:
record[f.label] = lift_flat(cx, vi, f.t)
return record
Variants are also lifted recursively. Lifting a variant must carefully follow
the definition of flatten_variant
above, consuming the exact same core types
regardless of the dynamic case payload being lifted. Because of the join
performed by flatten_variant
, we need a more-permissive value iterator that
reinterprets between the different types appropriately and also traps if the
high bits of an i64
are set for a 32-bit type:
def lift_flat_variant(cx, vi, cases):
flat_types = flatten_variant(cases)
assert(flat_types.pop(0) == 'i32')
case_index = vi.next('i32')
trap_if(case_index >= len(cases))
class CoerceValueIter:
def next(self, want):
have = flat_types.pop(0)
x = vi.next(have)
match (have, want):
case ('i32', 'f32') : return decode_i32_as_float(x)
case ('i64', 'i32') : return wrap_i64_to_i32(x)
case ('i64', 'f32') : return decode_i32_as_float(wrap_i64_to_i32(x))
case ('i64', 'f64') : return decode_i64_as_float(x)
case _ : assert(have == want); return x
c = cases[case_index]
if c.t is None:
v = None
else:
v = lift_flat(cx, CoerceValueIter(), c.t)
for have in flat_types:
_ = vi.next(have)
return { c.label: v }
def wrap_i64_to_i32(i):
assert(0 <= i < (1 << 64))
return i % (1 << 32)
Finally, flags are lifted by lifting to a record the same way as when loading flags from linear memory.
def lift_flat_flags(vi, labels):
assert(0 < len(labels) <= 32)
i = vi.next('i32')
return unpack_flags_from_int(i, labels)
The lower_flat
function defines how to convert a value v
of a given type
t
into zero or more core values. Presenting the definition of lower_flat
piecewise, we start with the top-level case analysis:
def lower_flat(cx, v, t):
match despecialize(t):
case BoolType() : return [int(v)]
case U8Type() : return [v]
case U16Type() : return [v]
case U32Type() : return [v]
case U64Type() : return [v]
case S8Type() : return lower_flat_signed(v, 32)
case S16Type() : return lower_flat_signed(v, 32)
case S32Type() : return lower_flat_signed(v, 32)
case S64Type() : return lower_flat_signed(v, 64)
case F32Type() : return [maybe_scramble_nan32(v)]
case F64Type() : return [maybe_scramble_nan64(v)]
case CharType() : return [char_to_i32(v)]
case StringType() : return lower_flat_string(cx, v)
case ErrorContextType() : return lower_error_context(cx, v)
case ListType(t, l) : return lower_flat_list(cx, v, t, l)
case RecordType(fields) : return lower_flat_record(cx, v, fields)
case VariantType(cases) : return lower_flat_variant(cx, v, cases)
case FlagsType(labels) : return lower_flat_flags(v, labels)
case OwnType() : return [lower_own(cx, v, t)]
case BorrowType() : return [lower_borrow(cx, v, t)]
case StreamType(t) : return [lower_stream(cx, v, t)]
case FutureType(t) : return [lower_future(cx, v, t)]
Since component-level values are assumed in-range and, as previously stated,
core i32
values are always internally represented as unsigned int
s,
unsigned integer values need no extra conversion. Signed integer values are
converted to unsigned core i32
s by 2s complement arithmetic (which again
would be a no-op in hardware):
def lower_flat_signed(i, core_bits):
if i < 0:
i += (1 << core_bits)
return [i]
Since strings and variable-length lists are stored in linear memory, lifting
can reuse the previous definitions; only the resulting pointers are returned
differently (as i32
values instead of as a pair in linear memory).
Fixed-length lists are lowered the same way as tuples (via lower_flat_record
below).
def lower_flat_string(cx, v):
ptr, packed_length = store_string_into_range(cx, v)
return [ptr, packed_length]
def lower_flat_list(cx, v, elem_type, maybe_length):
if maybe_length is not None:
assert(maybe_length == len(v))
flat = []
for e in v:
flat += lower_flat(cx, e, elem_type)
return flat
(ptr, length) = store_list_into_range(cx, v, elem_type)
return [ptr, length]
Records are lowered by recursively lowering their fields:
def lower_flat_record(cx, v, fields):
flat = []
for f in fields:
flat += lower_flat(cx, v[f.label], f.t)
return flat
Variants are also lowered recursively. Symmetric to lift_flat_variant
above,
lower_flat_variant
must consume all flattened types of flatten_variant
,
manually coercing the otherwise-incompatible type pairings allowed by join
:
def lower_flat_variant(cx, v, cases):
case_index, case_value = match_case(v, cases)
flat_types = flatten_variant(cases)
assert(flat_types.pop(0) == 'i32')
c = cases[case_index]
if c.t is None:
payload = []
else:
payload = lower_flat(cx, case_value, c.t)
for i,(fv,have) in enumerate(zip(payload, flatten_type(c.t))):
want = flat_types.pop(0)
match (have, want):
case ('f32', 'i32') : payload[i] = encode_float_as_i32(fv)
case ('i32', 'i64') : payload[i] = fv
case ('f32', 'i64') : payload[i] = encode_float_as_i32(fv)
case ('f64', 'i64') : payload[i] = encode_float_as_i64(fv)
case _ : assert(have == want)
for _ in flat_types:
payload.append(0)
return [case_index] + payload
Finally, flags are lowered by packing the flags into an i32
bitvector.
def lower_flat_flags(v, labels):
assert(0 < len(labels) <= 32)
return [pack_flags_into_int(v, labels)]
The lift_flat_values
function defines how to lift a list of core
parameters or results (given by the CoreValueIter
vi
) into a tuple
of component-level values with types ts
.
def lift_flat_values(cx, max_flat, vi, ts):
flat_types = flatten_types(ts)
if len(flat_types) > max_flat:
return lift_heap_values(cx, vi, ts)
else:
return [ lift_flat(cx, vi, t) for t in ts ]
def lift_heap_values(cx, vi, ts):
ptr = vi.next('i32')
tuple_type = TupleType(ts)
trap_if(ptr != align_to(ptr, alignment(tuple_type)))
trap_if(ptr + elem_size(tuple_type) > len(cx.opts.memory))
return list(load(cx, ptr, tuple_type).values())
Symmetrically, the lower_flat_values
function defines how to lower a
list of component-level values vs
of types ts
into a list of core
values. As already described for flatten_functype
above,
lowering handles the greater-than-max_flat
case by either allocating
storage with realloc
or accepting a caller-allocated buffer as an
out-param:
def lower_flat_values(cx, max_flat, vs, ts, out_param = None):
cx.inst.may_leave = False
flat_types = flatten_types(ts)
if len(flat_types) > max_flat:
flat_vals = lower_heap_values(cx, vs, ts, out_param)
else:
flat_vals = []
for i in range(len(vs)):
flat_vals += lower_flat(cx, vs[i], ts[i])
cx.inst.may_leave = True
return flat_vals
def lower_heap_values(cx, vs, ts, out_param):
tuple_type = TupleType(ts)
tuple_value = {str(i): v for i,v in enumerate(vs)}
if out_param is None:
ptr = cx.opts.realloc(0, 0, alignment(tuple_type), elem_size(tuple_type))
flat_vals = [ptr]
else:
ptr = out_param.next('i32')
flat_vals = []
trap_if(ptr != align_to(ptr, alignment(tuple_type)))
trap_if(ptr + elem_size(tuple_type) > len(cx.opts.memory))
store(cx, tuple_value, tuple_type, ptr)
return flat_vals
The may_leave
flag is guarded by canon_lower
below to prevent a component
from calling out of the component while in the middle of lowering, ensuring
that the relative ordering of the side effects of lifting followed by lowering
cannot be observed and thus an implementation may reliably fuse lifting with
lowering when making a cross-component call to avoid the intermediate copy.
Using the above supporting definitions, we can describe the static and dynamic
semantics of component-level canon
definitions. The following subsections
cover each of these canon
cases.
For a canonical definition:
(canon lift $callee:<funcidx> $opts:<canonopt>* (func $f (type $ft)))
validation specifies:
$callee
must have typeflatten_functype($opts, $ft, 'lift')
$f
is given type$ft
- a
memory
is present if required by lifting and is a subtype of(memory 1)
- a
realloc
is present if required by lifting and has type(func (param i32 i32 i32 i32) (result i32))
- if
async
is set, apost-return
function may not be set - if a
post-return
is present, it has type(func (param flatten_functype({}, $ft, 'lift').results))
When instantiating component instance $inst
:
- Define
$f
to be the partially-bound closurecanon_lift($opts, $inst, $ft, $callee)
The resulting function $f
takes 4 runtime arguments:
caller
: the caller'sTask
or, if this lifted function is being called by the host,None
on_start
: a nullary function that must be called to return the caller's arguments as a list of component-level valueson_return
: a unary function that must be called afteron_start
, passing the list of component-level return valueson_block
: a unary async function taking anasyncio.Future
toawait
, and returning the future's result. This function will be called (by the Canonical ABI) instead of rawawait
for any blocking operation.
The indirection of on_start
and on_return
are used to model the
interleaving of reading arguments out of the caller's stack and memory and
writing results back into the caller's stack and memory, which will vary in
async calls.
If $f
ends up being called by the host, the host is responsible for, in a
host-defined manner, conjuring up component-level values suitable for passing
into lower
and, conversely, consuming the component values produced by
lift
. For example, if the host is a native JS runtime, the JavaScript
embedding would specify how native JavaScript values are converted to and from
component values. Alternatively, if the host is a Unix CLI that invokes
component exports directly from the command line, the CLI could choose to
automatically parse argv
into component-level values according to the
declared types of the export. In any case, canon lift
specifies how these
variously-produced values are consumed as parameters (and produced as results)
by a single host-agnostic component.
Based on this, canon_lift
is defined in chunks as follows:
async def canon_lift(opts, inst, ft, callee, caller, on_start, on_return, on_block):
task = Task(opts, inst, ft, caller, on_return, on_block)
flat_args = await task.enter(on_start)
flat_ft = flatten_functype(opts, ft, 'lift')
assert(types_match_values(flat_ft.params, flat_args))
Each call to canon lift
creates a new Task
and waits to enter the task,
allowing the task to express backpressure (for several independent reasons
listed in Task.may_enter
above) before lowering the arguments into the
callee's memory.
In the synchronous case, if always-task-return
ABI option is set, the lifted
core wasm code must call canon task.return
to return a value before returning
to canon lift
(or else there will be a trap in task.exit()
), which allows
the core wasm to do cleanup and finalization before returning. Otherwise,
if always-task-return
is not set, canon lift
calls task.return
when
core wasm returns (which lifts the values return by core wasm) and then calls
the post-return
function to let core wasm do cleanup and finalization. In
the future, post-return
and the option to not set always-task-return
may
be deprecated and removed.
if opts.sync:
flat_results = await call_and_trap_on_throw(callee, task, flat_args)
if not opts.always_task_return:
assert(types_match_values(flat_ft.results, flat_results))
task.return_(flat_results)
if opts.post_return is not None:
[] = await call_and_trap_on_throw(opts.post_return, task, flat_results)
task.exit()
return
Next, the asynchronous non-callback
case is simple and requires task.return
to be called, effectively implying always-task-return
. Asynchronous waiting
happens by core wasm calling waitable-set.wait
.
else:
if not opts.callback:
[] = await call_and_trap_on_throw(callee, task, flat_args)
assert(types_match_values(flat_ft.results, []))
task.exit()
return
In contrast, the aynchronous callback
case does asynchronous waiting in
the event loop, with core wasm (repeatedly) returning instructions for what
to do next:
else:
[packed] = await call_and_trap_on_throw(callee, task, flat_args)
s = None
while True:
code,si = unpack_callback_result(packed)
if si != 0:
s = task.inst.waitable_sets.get(si)
match code:
case CallbackCode.EXIT:
task.exit()
return
case CallbackCode.YIELD:
await task.yield_(opts.sync)
e = None
case CallbackCode.WAIT:
trap_if(not s)
e = await task.wait_on(s.wait(), sync = False)
case CallbackCode.POLL:
trap_if(not s)
await task.yield_(opts.sync)
e = s.poll()
if e:
event, p1, p2 = e
else:
event, p1, p2 = (EventCode.NONE, 0, 0)
[packed] = await call_and_trap_on_throw(opts.callback, task, [event, p1, p2])
One detail worth noting here is that the index of the waitable set does not
need to be returned every time; as an optimization to avoid a waitable_sets
table access on every turn of the event loop, if the returned waitable set
index is 0
(which is an invalid table index anyways), the previous waitable
set will be used.
The bit-packing scheme used for the i32
packed
return value is defined as
follows:
class CallbackCode(IntEnum):
EXIT = 0
YIELD = 1
WAIT = 2
POLL = 3
MAX = 3
def unpack_callback_result(packed):
code = packed & 0xf
trap_if(code > CallbackCode.MAX)
assert(packed < 2**32)
assert(Table.MAX_LENGTH < 2**28)
waitable_set_index = packed >> 4
return (CallbackCode(code), waitable_set_index)
The ability to asynchronously wait, poll, yield and exit is thus available to
both the callback
and non-callback
cases, making callback
just an
optimization to avoid allocating stacks for async languages that have avoided
the need for stack-switching by design (e.g., async
/await
in JS, Python,
C# and Rust).
Uncaught Core WebAssembly exceptions result in a trap at component
boundaries. Thus, if a component wishes to signal an error, it must use some
sort of explicit type such as result
(whose error
case particular language
bindings may choose to map to and from exceptions):
async def call_and_trap_on_throw(callee, task, args):
try:
return await callee(task, args)
except CoreWebAssemblyException:
trap()
For a canonical definition:
(canon lower $callee:<funcidx> $opts:<canonopt>* (core func $f))
where $callee
has type $ft
, validation specifies:
$f
is given typeflatten_functype($opts, $ft, 'lower')
- a
memory
is present if required by lifting and is a subtype of(memory 1)
- a
realloc
is present if required by lifting and has type(func (param i32 i32 i32 i32) (result i32))
- there is no
post-return
in$opts
- if
contains_async_value($ft)
, then$opts.async
must be set
When instantiating component instance $inst
:
- Define
$f
to be the partially-bound closure:canon_lower($opts, $ft, $callee)
The resulting function $f
takes 2 runtime arguments:
task
: theTask
that was created bycanon_lift
when entering the current component instanceflat_args
: the list of core values passed by the core function calller
Given this, canon_lower
is defined in chunks as follows:
async def canon_lower(opts, ft, callee, task, flat_args):
trap_if(not task.inst.may_leave)
subtask = Subtask()
cx = LiftLowerContext(opts, task.inst, subtask)
Each call to canon lower
creates a new Subtask
. However, this Subtask
is
only added to the waitables
table if async
is specified and the callee
blocks. In any case, this Subtask
is used as the borrow_scope
for borrow
lifted during the call, ensuring that owned handles are not dropped before
Subtask.finish()
is called.
The on_start
and on_return
callbacks are called by the callee
(which is canon_lift
, when wasm calls wasm) to produce and consume
lifted arguments and results, resp. These callbacks are also used
by canon_lift
to update the CallState
of the Subtask
and notify
the async caller of progress via the on_progress
local callback,
which is assigned to do something useful in the async
-blocked case
below.
flat_ft = flatten_functype(opts, ft, 'lower')
assert(types_match_values(flat_ft.params, flat_args))
flat_args = CoreValueIter(flat_args)
flat_results = None
on_progress = lambda: None
def on_start():
subtask.state = CallState.STARTED
on_progress()
return lift_flat_values(cx, max_flat_params, flat_args, ft.param_types())
def on_return(vs):
subtask.state = CallState.RETURNED
on_progress()
nonlocal flat_results
flat_results = lower_flat_values(cx, max_flat_results, vs, ft.result_types(), flat_args)
assert(flat_args.done())
The max_flat_{params,results}
variables used above are defined in
the synchronous and asynchronous branches next:
if opts.sync:
assert(not contains_async_value(ft))
max_flat_params = MAX_FLAT_PARAMS
max_flat_results = MAX_FLAT_RESULTS
await task.call_sync(callee, task, on_start, on_return)
assert(subtask.state == CallState.RETURNED)
subtask.finish()
assert(types_match_values(flat_ft.results, flat_results))
In the synchronous case, Task.call_sync
ensures a fully-synchronous
call to callee
(that prevents any interleaved execution until
callee
returns). The not contains_async_value(ft)
assertion is
ensured by validation and reflects the fact that a function that takes
or returns a future
or stream
is extremely likely to deadlock if
called in this manner (since the whole point of these types is to
allow control flow to switch back and forth between caller and
callee).
The asynchronous case uses the Task.call_async
method (defined above) to
immediately return control flow back to the async
caller if callee
blocks:
else:
max_flat_params = 1
max_flat_results = 0
await task.call_async(callee, task, on_start, on_return)
match subtask.state:
case CallState.RETURNED:
subtask.finish()
flat_results = [0]
case _:
subtaski = subtask.add_to_waitables(task)
def on_progress():
def subtask_event():
if subtask.state == CallState.RETURNED:
subtask.finish()
return (EventCode(subtask.state), subtaski, 0)
subtask.set_event(subtask_event)
assert(0 < subtaski <= Table.MAX_LENGTH < 2**28)
assert(0 <= int(subtask.state) < 2**4)
flat_results = [int(subtask.state) | (subtaski << 4)]
return flat_results
If the callee
reached the RETURNED
state, the call returns 0
, signalling
to the caller that the parameters have been read and the results have been
written to the outparam buffer. Note that the callee may have blocked after
calling task.return
and thus the callee may still be executing concurrently.
However, all the caller needs to know is that it has received its return value
(and all borrowed handles have been returned) and thus the subtask is ready to
be dropped (via subtask.drop
).
If callee
did not reach the RETURNED
state, it must have blocked and so
the Subtask
is added to the current component instance's waitables
table,
eagerly returning the i32
index packed with the CallState
of the Subtask
.
If the returned CallState
is STARTING
, the caller must keep the memory
pointed to by the first i32
parameter valid until task.wait
indicates that
subtask i
has advanced to STARTED
or RETURNED
. Similarly, if the returned
state is STARTED
, the caller must keep the memory pointed to by the final
i32
parameter valid until task.wait
indicates that the subtask has advanced
to RETURNED
.
The on_progress
callback is called by on_start
and on_return
above and
sets a pending event on the Subtask
any time there is progress. If the
subtask advances CallState
multiple times before the pending event is
delivered, the pending event is overwritten in-place, delivering only the
most-recent CallState
to wasm. Once CallState.RETURNED
is delivered, the
subtask is finish()
ed, which returns ownership of borrowed handles to the
caller and allows the subtask to be dropped from the waitables
table.
The above definitions of sync/async canon_lift
/canon_lower
ensure that a
sync-or-async canon_lift
may call a sync-or-async canon_lower
, with all
combinations working. This is why the Task
class, which is used for both sync
and async canon_lift
calls, contains the code for handling async-lowered
subtasks. As mentioned above, conservative syntactic analysis of all canon
definitions in a component can statically rule out combinations so that, e.g.,
a DAG of all-sync components use a plain synchronous callstack and a DAG of all
async callback
components use only an event loop without fibers. It's only
when async
(without a callback
) or various compositions of async and sync
components are used that fibers (or Asyncify) are required to implement the
above async rules.
Since any cross-component call necessarily transits through a statically-known
canon_lower
+canon_lift
call pair, an AOT compiler can fuse canon_lift
and
canon_lower
into a single, efficient trampoline. In the future this may allow
efficient compilation of permissive subtyping between components (including the
elimination of string operations on the labels of records and variants) as well
as post-MVP adapter functions.
For a canonical definition:
(canon resource.new $rt (core func $f))
validation specifies:
$rt
must refer to locally-defined (not imported) resource type$f
is given type(func (param $rt.rep) (result i32))
, where$rt.rep
is currently fixed to bei32
.
Calling $f
invokes the following function, which adds an owning handle
containing the given resource representation in the current component
instance's resources
table:
async def canon_resource_new(rt, task, rep):
trap_if(not task.inst.may_leave)
h = ResourceHandle(rt, rep, own = True)
i = task.inst.resources.add(h)
return [i]
For a canonical definition:
(canon resource.drop $rt $async? (core func $f))
validation specifies:
$rt
must refer to resource type$f
is given type(func (param i32))
Calling $f
invokes the following function, which removes the handle from the
current component instance's resources
table and, if the handle was owning,
calls the resource's destructor.
async def canon_resource_drop(rt, sync, task, i):
trap_if(not task.inst.may_leave)
inst = task.inst
h = inst.resources.remove(i)
trap_if(h.rt is not rt)
trap_if(h.num_lends != 0)
flat_results = [] if sync else [0]
if h.own:
assert(h.borrow_scope is None)
if inst is rt.impl:
if rt.dtor:
await rt.dtor(h.rep)
else:
if rt.dtor:
caller_opts = CanonicalOptions(sync = sync)
callee_opts = CanonicalOptions(sync = rt.dtor_sync, callback = rt.dtor_callback)
ft = FuncType([U32Type()],[])
callee = partial(canon_lift, callee_opts, rt.impl, ft, rt.dtor)
flat_results = await canon_lower(caller_opts, ft, callee, task, [h.rep])
else:
task.trap_if_on_the_stack(rt.impl)
else:
h.borrow_scope.num_borrows -= 1
return flat_results
In general, the call to a resource's destructor is treated like a
cross-component call (as-if the destructor was exported by the component
defining the resource type). This means that cross-component destructor calls
follow the same concurrency rules as normal exports. However, since there are
valid reasons to call resource.drop
in the same component instance that
defined the resource, which would otherwise trap at the reentrance guard of
Task.enter
, an exception is made when the resource type's
implementation-instance is the same as the current instance (which is
statically known for any given canon resource.drop
).
When a destructor isn't present, the rules still perform a reentrance check since this is the caller's responsibility and the presence or absence of a destructor is an encapsualted implementation detail of the resource type.
For a canonical definition:
(canon resource.rep $rt (core func $f))
validation specifies:
$rt
must refer to a locally-defined (not imported) resource type$f
is given type(func (param i32) (result $rt.rep))
, where$rt.rep
is currently fixed to bei32
.
Calling $f
invokes the following function, which extracts the resource
representation from the handle.
async def canon_resource_rep(rt, task, i):
h = task.inst.resources.get(i)
trap_if(h.rt is not rt)
return [h.rep]
Note that the "locally-defined" requirement above ensures that only the component instance defining a resource can access its representation.
For a canonical definition:
(canon context.get $t $i (core func $f))
validation specifies:
$t
must bei32
(for now; see here)$i
must be less than2
$f
is given type(func (result i32))
Calling $f
invokes the following function, which reads the context-local
storage of the current task:
async def canon_context_get(t, i, task):
assert(t == 'i32')
assert(i < ContextLocalStorage.LENGTH)
return [task.context.get(i)]
For a canonical definition:
(canon context.set $t $i (core func $f))
validation specifies:
$t
must bei32
(for now; see here)$i
must be less than2
$f
is given type(func (param $v i32))
Calling $f
invokes the following function, which writes to the context-local
storage of the current task:
async def canon_context_set(t, i, task, v):
assert(t == 'i32')
assert(i < ContextLocalStorage.LENGTH)
task.context.set(i, v)
return []
For a canonical definition:
(canon backpressure.set (core func $f))
validation specifies:
$f
is given type(func (param $enabled i32))
Calling $f
invokes the following function, which sets the backpressure
flag on the current ComponentInstance
:
async def canon_backpressure_set(task, flat_args):
trap_if(task.opts.sync)
task.inst.backpressure = bool(flat_args[0])
return []
The backpressure
flag is read by Task.enter
(defined above) to prevent new
tasks from entering the component instance and forcing the guest code to
consume resources.
For a canonical definition:
(canon task.return (result $t)? $opts (core func $f))
validation specifies:
$f
is given typeflatten_functype($opts, (func (param $t)?), 'lower')
Calling $f
invokes the following function which uses Task.return_
to lift
and pass the results to the caller:
async def canon_task_return(task, result_type, opts, flat_args):
trap_if(not task.inst.may_leave)
trap_if(task.opts.sync and not task.opts.always_task_return)
trap_if(result_type != task.ft.results)
trap_if(opts != task.opts)
task.return_(flat_args)
return []
The trap_if(result_type != task.ft.results)
guard ensures that, in a
component with multiple exported functions of different types, task.return
is
not called with a mismatched result type (which, due to indirect control flow,
can in general only be caught dynamically).
The trap_if(opts != task.opts)
guard ensures that the return value is lifted
the same way as the canon lift
from which this task.return
is returning.
This ensures that AOT fusion of canon lift
and canon lower
can generate
a thunk that is indirectly called by task.return
after these guards.
For a canonical definition:
(canon yield $async? (core func $f))
validation specifies:
$f
is given type(func)
Calling $f
calls Task.yield_
to allow other tasks to execute:
async def canon_yield(sync, task):
trap_if(not task.inst.may_leave)
trap_if(task.opts.callback and not sync)
await task.yield_(sync)
return []
If async
is not set, no other tasks in the same component instance can
execute, however tasks in other component instances may execute. This allows
a long-running task in one component to avoid starving other components without
needing support full reentrancy.
The guard preventing async
use of task.poll
when a callback
has
been used preserves the invariant that producer toolchains using
callback
never need to handle multiple overlapping callback
activations.
For a canonical definition:
(canon waitable-set.new (core func $f))
validation specifies:
$f
is given type(func (result i32))
Calling $f
invokes the following function, which adds an empty waitable set
to the component instance's waitable_sets
table:
async def canon_waitable_set_new(task):
trap_if(not task.inst.may_leave)
return [ task.inst.waitable_sets.add(WaitableSet()) ]
For a canonical definition:
(canon waitable-set.wait $async? (memory $mem) (core func $f))
validation specifies:
$f
is given type(func (param $si) (param $ptr i32) (result i32))
Calling $f
invokes the following function which waits for progress to be made
on a waitable in the given waitable set (indicated by index $si
) and then
returning its EventCode
and writing the payload values into linear memory:
async def canon_waitable_set_wait(sync, mem, task, si, ptr):
trap_if(not task.inst.may_leave)
trap_if(task.opts.callback and not sync)
s = task.inst.waitable_sets.get(si)
e = await task.wait_on(s.wait(), sync)
return unpack_event(mem, task, ptr, e)
def unpack_event(mem, task, ptr, e: EventTuple):
event, p1, p2 = e
cx = LiftLowerContext(CanonicalOptions(memory = mem), task.inst)
store(cx, p1, U32Type(), ptr)
store(cx, p2, U32Type(), ptr + 4)
return [event]
If async
is not set, wait_on
will prevent other tasks from executing in
the same component instance, which can be useful for producer toolchains in
situations where interleaving is not supported. However, this is generally
worse for concurrency and thus producer toolchains should set async
when
possible.
wait
can be called from a synchronously-lifted export so that even
synchronous code can make concurrent import calls. In these synchronous cases,
though, the automatic backpressure (applied by Task.enter
) will ensure there
is only ever at most once synchronously-lifted task executing in a component
instance at a time.
The guard preventing async
use of wait
when a callback
has been used
preserves the invariant that producer toolchains using callback
never need to
handle multiple overlapping callback activations.
For a canonical definition:
(canon waitable-set.poll $async? (memory $mem) (core func $f))
validation specifies:
$f
is given type(func (param $si i32) (param $ptr i32) (result i32))
Calling $f
invokes the following function, which returns NONE
(0
) instead
of blocking if there is no event available, and otherwise returns the event the
same way as wait
.
async def canon_waitable_set_poll(sync, mem, task, si, ptr):
trap_if(not task.inst.may_leave)
trap_if(task.opts.callback and not sync)
s = task.inst.waitable_sets.get(si)
await task.yield_(sync)
if (e := s.poll()):
return unpack_event(mem, task, ptr, e)
return [EventCode.NONE]
When async
is set, task.poll
can yield to other tasks (in this or other
components) as part of polling for an event.
The guard preventing async
use of task.poll
when a callback
has
been used preserves the invariant that producer toolchains using
callback
never need to handle multiple overlapping callback
activations.
For a canonical definition:
(canon waitable-set.drop (core func $f))
validation specifies:
$f
is given type(func (param i32))
Calling $f
invokes the following function, which removes the given
waitable set from the component instance table, performing the guards defined
by WaitableSet.drop
above:
async def canon_waitable_set_drop(task, i):
trap_if(not task.inst.may_leave)
s = task.inst.waitable_sets.remove(i)
s.drop()
return []
For a canonical definition:
(canon waitable.join (core func $f))
validation specifies:
$f
is given type(func (param $wi i32) (param $si i32))
Calling $f
invokes the following function:
async def canon_waitable_join(task, wi, si):
trap_if(not task.inst.may_leave)
w = task.inst.waitables.get(wi)
if si == 0:
w.join(None)
else:
w.join(task.inst.waitable_sets.get(si))
return []
Note that tables do not allow elements at index 0
, so 0
is a valid sentinel
that tells join
to remove the given waitable from any set that it is
currently a part of. Waitables can be a member of at most one set, so if the
given waitable is already in one set, it will be transferred.
For a canonical definition:
(canon subtask.drop (core func $f))
validation specifies:
$f
is given type(func (param i32))
Calling $f
removes the subtask at the given index from the current
component instance's watiable
table, performing the guards and bookkeeping
defined by Subtask.drop()
.
async def canon_subtask_drop(task, i):
trap_if(not task.inst.may_leave)
s = task.inst.waitables.remove(i)
trap_if(not isinstance(s, Subtask))
s.drop()
return []
For canonical definitions:
(canon stream.new $t (core func $f))
(canon future.new $t (core func $f))
validation specifies:
$f
is given type(func (result i32))
Calling $f
calls canon_{stream,future}_new
which creates a new abstract
{stream, future} and adds a new writable end for it to the waitables
table
and returns its index.
async def canon_stream_new(elem_type, task):
trap_if(not task.inst.may_leave)
stream = ReadableStreamGuestImpl(elem_type, task.inst)
return [ task.inst.waitables.add(WritableStreamEnd(stream)) ]
async def canon_future_new(t, task):
trap_if(not task.inst.may_leave)
future = ReadableStreamGuestImpl(t, task.inst)
return [ task.inst.waitables.add(WritableFutureEnd(future)) ]
Because futures are just streams with extra limitations, here we see that a
WritableFutureEnd
shares the same ReadableStreamGuestImpl
type as
WritableStreamEnd
; the extra limitations are added by WritableFutureEnd
and
the future built-ins below.
For canonical definitions:
(canon stream.read $t $opts (core func $f))
(canon stream.write $t $opts (core func $f))
validation specifies:
$f
is given type(func (param i32 i32 i32) (result i32))
For canonical definitions:
(canon future.read $t $opts (core func $f))
(canon future.write $t $opts (core func $f))
validation specifies:
$f
is given type(func (param i32 i32) (result i32))
The implementation of these four built-ins all funnel down to a single
parameterized copy
function:
async def canon_stream_read(t, opts, task, i, ptr, n):
return await copy(ReadableStreamEnd, WritableBufferGuestImpl, EventCode.STREAM_READ,
t, opts, task, i, ptr, n)
async def canon_stream_write(t, opts, task, i, ptr, n):
return await copy(WritableStreamEnd, ReadableBufferGuestImpl, EventCode.STREAM_WRITE,
t, opts, task, i, ptr, n)
async def canon_future_read(t, opts, task, i, ptr):
return await copy(ReadableFutureEnd, WritableBufferGuestImpl, EventCode.FUTURE_READ,
t, opts, task, i, ptr, 1)
async def canon_future_write(t, opts, task, i, ptr):
return await copy(WritableFutureEnd, ReadableBufferGuestImpl, EventCode.FUTURE_WRITE,
t, opts, task, i, ptr, 1)
Introducing the copy
function in chunks, copy
first checks that the
Waitable
at index i
is the right type and that there is not already a copy
in progress. (In the future, this restriction could be relaxed, allowing a
finite number of pipelined reads or writes.) Then a readable or writable buffer
is created which (in Buffer
's constructor) eagerly checks the alignment and
bounds of (i
, n
).
async def copy(EndT, BufferT, event_code, t, opts, task, i, ptr, n):
trap_if(not task.inst.may_leave)
e = task.inst.waitables.get(i)
trap_if(not isinstance(e, EndT))
trap_if(e.stream.t != t)
trap_if(e.copying)
assert(not contains_borrow(t))
cx = LiftLowerContext(opts, task.inst, borrow_scope = None)
buffer = BufferT(t, cx, ptr, n)
Next, in the synchronous case, Task.wait_on
is used to synchronously and
uninterruptibly wait for the on_*
callbacks to indicate that the copy has made
progress. In the case of on_partial_copy
, this code carefully delays the call
to revoke_buffer
until right before control flow is returned back to the
calling core wasm code. This enables another task to potentially complete
multiple partial copies before having to context-switch back.
if opts.sync:
final_revoke_buffer = None
def on_partial_copy(revoke_buffer):
nonlocal final_revoke_buffer
final_revoke_buffer = revoke_buffer
if not async_copy.done():
async_copy.set_result(None)
on_copy_done = partial(on_partial_copy, revoke_buffer = lambda:())
if e.copy(buffer, on_partial_copy, on_copy_done) != 'done':
async_copy = asyncio.Future()
await task.wait_on(async_copy, sync = True)
final_revoke_buffer()
In the asynchronous case, the on_*
callbacks set a pending event on the
Waitable
which will be delivered to core wasm when core wasm calls
task.{wait,poll}
or, if using callback
, returns to the event loop.
Symmetric to the synchronous case, this code carefully delays calling
revoke_buffer
until the copy event is actually delivered to core wasm,
allowing multiple partial copies to complete in the interim, reducing overall
context-switching overhead.
else:
def copy_event(revoke_buffer):
revoke_buffer()
e.copying = False
return (event_code, i, pack_copy_result(task, buffer, e))
def on_partial_copy(revoke_buffer):
e.set_event(partial(copy_event, revoke_buffer))
def on_copy_done():
e.set_event(partial(copy_event, revoke_buffer = lambda:()))
if e.copy(buffer, on_partial_copy, on_copy_done) != 'done':
e.copying = True
return [BLOCKED]
return [pack_copy_result(task, buffer, e)]
However the copy completes, the results are reported to the caller via
pack_copy_result
:
BLOCKED = 0xffff_ffff
CLOSED = 0x8000_0000
def pack_copy_result(task, buffer, e):
if buffer.progress:
assert(buffer.progress <= Buffer.MAX_LENGTH < BLOCKED)
assert(not (buffer.progress & CLOSED))
return buffer.progress
elif e.stream.closed():
if (errctx := e.stream.closed_with_error()):
assert(isinstance(e, ReadableStreamEnd|ReadableFutureEnd))
errctxi = task.inst.error_contexts.add(errctx)
assert(errctxi != 0)
else:
errctxi = 0
assert(errctxi <= Table.MAX_LENGTH < BLOCKED)
assert(not (errctxi & CLOSED))
return errctxi | CLOSED
else:
return 0
The order of tests here indicates that, if some progress was made and then the
stream was closed, only the progress is reported and the CLOSED
status is
left to be discovered next time.
As asserted here, error-context
s are only possible on the readable end of a
stream or future (since, as defined below, only the writable end can close
the stream with an error-context
). Thus, error-context
s only flow in the
same direction as values, as an optional last value of the stream or future.
For canonical definitions:
(canon stream.cancel-read $t $async? (core func $f))
(canon stream.cancel-write $t $async? (core func $f))
(canon future.cancel-read $t $async? (core func $f))
(canon future.cancel-write $t $async? (core func $f))
validation specifies:
$f
is given type(func (param i32) (result i32))
The implementation of these four built-ins all funnel down to a single
parameterized cancel_copy
function:
async def canon_stream_cancel_read(t, sync, task, i):
return await cancel_copy(ReadableStreamEnd, EventCode.STREAM_READ, t, sync, task, i)
async def canon_stream_cancel_write(t, sync, task, i):
return await cancel_copy(WritableStreamEnd, EventCode.STREAM_WRITE, t, sync, task, i)
async def canon_future_cancel_read(t, sync, task, i):
return await cancel_copy(ReadableFutureEnd, EventCode.FUTURE_READ, t, sync, task, i)
async def canon_future_cancel_write(t, sync, task, i):
return await cancel_copy(WritableFutureEnd, EventCode.FUTURE_WRITE, t, sync, task, i)
async def cancel_copy(EndT, event_code, t, sync, task, i):
trap_if(not task.inst.may_leave)
e = task.inst.waitables.get(i)
trap_if(not isinstance(e, EndT))
trap_if(e.stream.t != t)
trap_if(not e.copying)
if not e.has_pending_event():
e.stream.cancel()
if not e.has_pending_event():
if sync:
await task.wait_on(e.wait_for_pending_event(), sync = True)
else:
return [BLOCKED]
code,index,payload = e.get_event()
assert(not e.copying and code == event_code and index == i)
return [payload]
The first check for e.has_pending_event()
catches the case where the copy has
already racily finished, in which case we must not call stream.cancel()
.
Calling stream.cancel()
may, but is not required to, recursively call one of
the on_*
callbacks (passed by canon_{stream,future}_{read,write}
above)
which will set a pending event that is caught by the second check for
e.has_pending_event()
.
If the copy hasn't been cancelled, the synchronous case uses Task.wait_on
to
synchronously and uninterruptibly wait for one of the on_*
callbacks to
eventually be called (which will set the pending event).
The asynchronous case simply returns BLOCKING
and the client code must wait
as usual for a {STREAM,FUTURE}_{READ,WRITE}
event. In this case, cancellation
has served only to asynchronously request that the host relinquish the buffer
ASAP without waiting for anything to be read or written.
If BLOCKING
is not returned, the pending event (which is necessarily a
copy_event
) is eagerly delivered to core wasm as the return value, thereby
saving an additional turn of the event loop. In this case, the core wasm
caller can assume that ownership of the buffer has been returned.
For canonical definitions:
(canon stream.close-readable $t (core func $f))
(canon future.close-readable $t (core func $f))
validation specifies:
$f
is given type(func (param i32))
and for canonical definitions:
(canon stream.close-writable $t (core func $f))
(canon future.close-writable $t (core func $f))
validation specifies:
$f
is given type(func (param i32 i32))
Calling $f
removes the readable or writable end of the stream or future at
the given index from the current component instance's waitable
table,
performing the guards and bookkeeping defined by
{Readable,Writable}{Stream,Future}End.drop()
above.
async def canon_stream_close_readable(t, task, i):
return await close(ReadableStreamEnd, t, task, i, 0)
async def canon_stream_close_writable(t, task, hi, errctxi):
return await close(WritableStreamEnd, t, task, hi, errctxi)
async def canon_future_close_readable(t, task, i):
return await close(ReadableFutureEnd, t, task, i, 0)
async def canon_future_close_writable(t, task, hi, errctxi):
return await close(WritableFutureEnd, t, task, hi, errctxi)
async def close(EndT, t, task, hi, errctxi):
trap_if(not task.inst.may_leave)
e = task.inst.waitables.remove(hi)
if errctxi == 0:
errctx = None
else:
errctx = task.inst.error_contexts.get(errctxi)
trap_if(not isinstance(e, EndT))
trap_if(e.stream.t != t)
e.drop(errctx)
return []
Note that only the writable ends of streams and futures can be closed with a
final error-context
value and thus error-context
s only flow in the same
direction as values as an optional last value of the stream or future.
For a canonical definition:
(canon error-context.new $opts (core func $f))
validation specifies:
$f
is given type(func (param i32 i32) (result i32))
Calling $f
calls the following function which uses the $opts
immediate to
(non-deterministically) lift the debug message, create a new ErrorContext
value, store it in the per-component-instance error_contexts
table and
returns its index.
@dataclass
class ErrorContext:
debug_message: String
async def canon_error_context_new(opts, task, ptr, tagged_code_units):
trap_if(not task.inst.may_leave)
if DETERMINISTIC_PROFILE or random.randint(0,1):
s = String(('', 'utf8', 0))
else:
cx = LiftLowerContext(opts, task.inst)
s = load_string_from_range(cx, ptr, tagged_code_units)
s = host_defined_transformation(s)
i = task.inst.error_contexts.add(ErrorContext(s))
return [i]
Supporting the requirement (introduced in the
explainer) that wasm code does not depend on
the contents of error-context
values for behavioral correctness, the debug
message is completely discarded non-deterministically or, in the deterministic
profile, always. Importantly (for performance), when the debug message is
discarded, it is not even lifted and thus the O(N) well-formedness conditions
are not checked. (Note that host_defined_transformation
is not defined by the
Canonical ABI and stands for an arbitrary host-defined function.)
For a canonical definition:
(canon error-context.debug-message $opts (core func $f))
validation specifies:
$f
is given type(func (param i32 i32))
Calling $f
calls the following function which uses the $opts
immediate to
lowers the ErrorContext
's debug message. While producing an error-context
value may non-deterministically discard or transform the debug message, a
single error-context
value must return the same debug message from
error.debug-message
over time.
async def canon_error_context_debug_message(opts, task, i, ptr):
trap_if(not task.inst.may_leave)
errctx = task.inst.error_contexts.get(i)
cx = LiftLowerContext(opts, task.inst)
store_string(cx, errctx.debug_message, ptr)
return []
Note that ptr
points to an 8-byte region of memory into which will be stored
the pointer and length of the debug string (allocated via opts.realloc
).
For a canonical definition:
(canon error-context.drop (core func $f))
validation specifies:
$f
is given type(func (param i32))
Calling $f
calls the following function, which drops the error-context
value from the current component instance's error_contexts
table.
async def canon_error_context_drop(task, i):
trap_if(not task.inst.may_leave)
task.inst.error_contexts.remove(i)
return []
For a canonical definition:
(canon thread.spawn (type $ft) (core func $st))
validation specifies:
$ft
must refer to ashared
function type; initially, only the type(func shared (param $c i32))
is allowed (see explanation below)$st
is given type(func (param $f (ref null $ft)) (param $c i32) (result $e i32))
.
Note: ideally, a thread could be spawned with arbitrary thread parameters. Currently, that would require additional work in the toolchain to support so, for simplicity, the current proposal simply fixes a single
i32
parameter type. However,thread.spawn
could be extended to allow arbitrary thread parameters in the future, once it's concretely beneficial to the toolchain. The inclusion of$ft
ensures backwards compatibility for when arbitrary parameters are allowed.
Calling $st
checks that the reference $f
is not null. Then, it spawns a
thread which:
- invokes
$f
with$c
- executes
$f
until completion or trap in ashared
context as described by the shared-everything threads proposal.
In pseudocode, $st
looks like:
def canon_thread_spawn(f, c):
trap_if(f is None)
if DETERMINISTIC_PROFILE:
return [-1]
def thread_start():
try:
f(c)
except CoreWebAssemblyException:
trap()
if spawn(thread_start):
return [0]
else:
return [-1]
For a canonical definition:
(canon thread.available_parallelism (core func $f))
validation specifies:
$f
is given type(func shared (result i32))
.
Calling $f
returns the number of threads the underlying hardware can be
expected to execute in parallel. This value can be artificially limited by
engine configuration and is not allowed to change over the lifetime of a
component instance.
def canon_thread_available_parallelism():
if DETERMINISTIC_PROFILE:
return [1]
else:
return [NUM_ALLOWED_THREADS]