-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module splitting and type section duplication #1530
Comments
To make a brief comment here summarising my suggestion in the meeting - an extension to If we just want the (complete) types section shared and nothing else, we could technically get away without repeated sections, just with a restricted concept of module fragments which are all the module sections after the type section (and at this point we're close to the raw "concatenate bytes instead" point that @rossberg made). My arguments in favour of a "module fragment" approach rather than a byte approach would be:
To acknowledge some arguments against:
|
To expand on this point, we could imagine a limited version of my API where the provided prefix module must have all its sections after the type section empty, and the streamed "module fragment" must declare exactly the sections after the type section in regular module order. If we knew the general solution with full repeated sections was on our roadmap, this might be an acceptable MVP. However if we never planned to extend to the general thing, I think this solution would appear quite messy on its own (and would have questionable value over more naive byte-based or compression-based solutions). |
Sorry I missed the meeting, the notes aren't up yet, so I don't know all the discussion points. I echo the points @conrad-watt raised, to the extent I understand them without context. We should allow repeated sections; it's high time for this relaxation. Module fragments as a more structured concept sounds more attractive than byte-level concatenation. I've had offline conversation with @rossberg about "modules with holes", which, while not entirely baked, could allow an outer module to be filled in with one or more sections or function bodies. It seems like submodule granularity is something worth reasoning through. |
@titzer, to clarify, module "fragments" would very much imply byte-level concatenation as well. A fragment as discussed in the meeting has no semantic meaning, it's just a piece of a module's binary representation that would be stitched together into an actual binary on the fly by the API. I agree that isn't very attractive. In the meeting I dared calling it a hack. My point in the meeting was that the sections relaxation isn't strictly needed for this, at least not for the type section use case prompting the discussion, though it would obviously make the stitching more flexible. That said, I'm very much in favour of allowing repeated sections in general. |
As for a more structured solution, I suggested experimenting with type imports to prune the type definitions not relevant to each module itself. I could imagine this might help significantly, since the underlying problem is that we currently have to include the transitive closure of all type definitions used by a module. And the size of that tends to grow combinatorially with the depth of the dependency graph. |
I plan to investigate the type imports approach first. Although the module concatenation solutions would solve the code size problem perfectly, we have enough problems as it is integrating Wasm into JS build and serving infrastructure, so I would like to avoid introducing a new “module fragment” entity into the ecosystem if possible. |
When we discussed this in the CG, there were some questions about compression. I've now updated the original post with comparisons of compressed code size, including when using the primary split module as a compression dictionary for the other split modules. |
Hello! I've been looking into more robust module splitting solutions, particularly for WasmGC modules. Here are the results of an experiment I did where I split calcworker_wasm.wasm from Google Sheets into 216 modules. Functions were automatically assigned to modules based on their original build targets rather than what would make sense to actually serve, but this should suffice to get an idea of the overheads involved.
First, here's how overall code size was affected, along with a breakdown by section of where the change came from.
We can probably reduce the code size due to imports and exports a bit more by moving more than just functions into the secondary modules. For example, if a secondary module is the only one that uses a particular imported global, it could just import the global directly. Today, the primary module imports and re-exports the global, then the secondary module imports it from the primary module.
But there's nothing we can do today to improve the code size of the type sections! The types are already arranged into minimal recursion groups and included only in modules where they are necessary for validation. Here's a breakdown of how many types each module uses either directly or indirectly. A directly used type is one that is in a rec group with a type that is directly allocated, accessed, cast, or otherwise referenced from the code. All other types are indirectly used, and are necessary to include only because they appear somewhere in the expanded definition of a directly used type.
On average, each type appears in about 10 modules, but is only directly used in 5.5 of them.
I'm interested in hearing what ideas folks have about how we could reduce the overhead of duplicated type sections. The best case would be that we could directly use the full type section from the primary module in each of the secondary modules without having to download it again. Another solution might look more like compile-time type imports that are able to abstract away the unused types, but that would still require repeating the used types. Either way, I don't have all the details worked out. Are there other or more complete ideas out there?
For completeness, here's how the code bloat looks when the modules are compressed.
Unsurprisingly, brotli compresses better than gzip, but because the baseline compression is better, the overhead from splitting is relatively worse. Using the primary module as a dictionary for the compression gives the best absolute and relative overheads. (Chrome supports shared dictionary compression as of a few weeks ago.)
The text was updated successfully, but these errors were encountered: