Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking.md: Use multiple data and code sections #138

Open
sbc100 opened this issue Feb 21, 2020 · 4 comments
Open

Linking.md: Use multiple data and code sections #138

sbc100 opened this issue Feb 21, 2020 · 4 comments

Comments

@sbc100
Copy link
Member

sbc100 commented Feb 21, 2020

I'd like to propose that we move towards using multple data and code sections in the object format.

This matches llvm's internal ideas about what a section is. Today if you iterate through section in an object you will only see a single code section, even though we default -ffunction-section. This mean the linker is forced then break down the monolithic code and section sections in sub-sections.

There are bugs popping up due to the fact that we dont currently map llvm's concept of a section onto a wasm section: https://reviews.llvm.org/D74531

There is a wasm proposal out of make repeated sections a valid thing: https://github.com/WebAssembly/conditional-sections.

The fact that we can currently validate wasm object files with tools like wasm-validate is feature I don't want to loose, so such tools would need to learn about conditional sections (at at least the multi-section part of it) before we would want to enable this by default.

@dschuff
Copy link
Member

dschuff commented Mar 9, 2020

I think this makes sense. IIUC the current state is that object files can't be loaded without being relocated (i.e. they can't run correctly) but they do validate, right? We could preserve that property by just declaring that they use the conditional-sections proposal, and that any tool that wants to process them has to support that proposal (and of course those tools would still maintain "mvp" object file support). I think that also means we can do it as soon as the proposal is stable enough and supported by tools; we don't necessarily have to wait until all the browsers support it (as long as we're comfortable with "shipping" before stage 4, at risk of having to break compatibility or maintain extra hacks if things change).

@aardappel
Copy link

There's currently a lot of tools that will let you look at the contents of a .o even though they don't understand that it is different in some way from a regular .wasm, it be a shame to have all those stop working. So we'd have to make an effort to fix all of them. We're not the authors of all of them :)

Also, I am not following what information is gained by putting a function in a code section by itself, since a code section carries no information other than.. its size? Seems to me the linking data referring to segments of a code section or to a whole code section would be entirely equivalent, what am I missing?

@sbc100
Copy link
Member Author

sbc100 commented Mar 17, 2020

You are correct its very useful that many tools can inspect object files. Requiring those tools to be aware of the multi-sections thing is (as far as I can tell) the main/only downside to this change.

But I think its worth it. Aside from binaryen and wabt how many other object inspection tools are there out there? If its only one or two then I'm certainly prepared to do the work on them too.

The benefits are mostly for consistency and simplicity of internal representation within llvm. There are two primary places I'm thinking about:

  1. Any tool that used llvm's libObjectFile to iterate through section. We expect each function to be in its own section since wasm is always -ffunction-sections. If I have 3 functions I expect to see 3 code sections the objdump output.

  2. The linker works on the granularity of sections. We currently subdivide the data and code sections in subsections (that we call "chunks" in the current wasm-ld code) in order to work around this.

Also the motivating issue: https://reviews.llvm.org/D74531. Here clang is expecting the ast to live in its own "section", but in the current model data sections are not modeled as section at all but segments (sub-sections of the data section which llvm tools don't know about).

@tlively
Copy link
Member

tlively commented Mar 17, 2020

There is precedent for requiring tools to implement stage 3 proposals to read object files: all object files currently contain a data count section whether or not bulk memory is enabled for their contents. So I think requiring tools to implement a proposal to continue reading object files is acceptable, as long as that proposal is reasonably stable and we are confident that it will eventually be standardized. I would not say the conditional sections proposal is quite there yet.

sbc100 added a commit to emscripten-core/emscripten that referenced this issue Dec 18, 2020
This is enought make it work up until llvm-cov tries to read the
named data sections in the binary and can't find them.  For this
final part to work we probably need to switch the object format to
using multiple code and data sections:
WebAssembly/tool-conventions#138

Not sure if its worth submitting this part in isolation without
a fully working solution?

See #13046
sbc100 added a commit to emscripten-core/emscripten that referenced this issue Dec 26, 2020
This is enought make it work up until llvm-cov tries to read the
named data sections in the binary and can't find them.  For this
final part to work we probably need to switch the object format to
using multiple code and data sections:
WebAssembly/tool-conventions#138

Not sure if its worth submitting this part in isolation without
a fully working solution?

See #13046
sbc100 pushed a commit to llvm/llvm-project that referenced this issue Oct 19, 2021
Emit __clangast in custom section instead of named data segment
to find it while iterating sections.
This could be avoided if all data segements (the wasm sense) were
represented as their own sections (in the llvm sense).
This can be resolved by WebAssembly/tool-conventions#138

And the on-disk hashtable in clangast needs to be aligned by 4 bytes,
so add paddings in name length field in custom section header.

The length of clangast section name can be represented in 1 byte
by leb128, and possible maximum pads are 3 bytes, so the section
name length won't be invalid in theory.

Fixes https://bugs.llvm.org/show_bug.cgi?id=35928

Differential Revision: https://reviews.llvm.org/D74531
compnerd pushed a commit to swiftlang/llvm-project that referenced this issue Oct 27, 2021
Emit __clangast in custom section instead of named data segment
to find it while iterating sections.
This could be avoided if all data segements (the wasm sense) were
represented as their own sections (in the llvm sense).
This can be resolved by WebAssembly/tool-conventions#138

And the on-disk hashtable in clangast needs to be aligned by 4 bytes,
so add paddings in name length field in custom section header.

The length of clangast section name can be represented in 1 byte
by leb128, and possible maximum pads are 3 bytes, so the section
name length won't be invalid in theory.

Fixes https://bugs.llvm.org/show_bug.cgi?id=35928

Differential Revision: https://reviews.llvm.org/D74531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants