-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we allow more identifiers instead of having @name? #21
Comments
Just thinking out loud about the implementation of this in the new custom section support in the interpreter: We can still use the current custom section handler for the name section, and for text parsing, instead of getting the names from It is a bit annoying that the AST does not keep the names, so they need to be re-parsed from text, but it's doable. |
This seems fine, as long as there is no expectation that we are getting into any of the Unicode messes surrounding normalisation etc., i.e., for determining id equality, we are comparing just raw code points (which may or may not be what users might expect). |
PS: Also, a reminder that Unicode spoofing has become a real security concern. Allowing completely arbitrary Unicode strings in something as central and sensitive as identifiers is elevating that risk. It's hard to predict how relevant that is for Wasm, but I could imagine security-sensitive hand-written Wasm in the future, which makes me feel uncomfortable about this. Perhaps we should try to get the opinion of some knowledgeable security experts? |
Just to jump into this, I've been working on an implementation of the Having a "binary-specific" name (from the
Given these problems, I don't think |
Currently in wasm-tools when a wasm file is printed any names of functions which are not valid identifiers cause a One gotcha though is that the custom name section has no requirements on uniqueness of identifiers but the text format does have requirements on this. For example this module: (module
(func (@name "foo"))
(func (@name "foo"))
) is not valid to print with For comparison, currently (module
(type (;0;) (func))
(func $foo (;0;) (type 0))
(func $#func1<foo> (@name "foo") (;1;) (type 0))
) where the second function gets |
Ahh, good point that we so far would not be able to round-trip binaries with repeated names in the name section. We could solve that by allowing repeated identifiers in the text format and using indices (with comments) rather than identifiers to disambiguate at the use sites. |
In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see WebAssembly/spec#617 and WebAssembly/annotations#21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.
In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see WebAssembly/spec#617 and WebAssembly/annotations#21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.
In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see WebAssembly/spec#617 and WebAssembly/annotations#21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.
@rossberg, what do we need to make progress here? Given that the names section is used only as a debugging aid, I haven't heard anyone else voice concerns about unicode spoofing risks. |
I'm okay doing this change, but it requires some spec work. The easy part is to extend the definition of identifiers to
But that's far from enough. The text format spec currently relies on equality of identifiers being completely syntactic. If we allow strings, then syntax and denotation can differ, so we'll need to introduce equivalence classes, and need to define that elaboration. For example |
After today's CG discussion, I think it makes more sense to keep the I would also suggest that we shouldn't worry about round-tripping identifiers. It's not a bad thing to put identifiers in the name section, or to pull identifiers from the name section, but identifiers don't really correspond to anything in the source language and hence I don't think it's that critical to preserve them. (To that end, I don't think I see the value of quoted identifiers if also keeping |
I was about to merge #23, but I don't mind either way. They do add some complexity to the spec, o.t.o.h. it's handy to have an escape mechanism for richer Wasm identifiers. How strongly does everybody feel? |
I very much want the extended identifiers, even though we're also keeping the name annotation. For the common case where there are not multiple or repeated names, the names can match 1:1, so being able to use the arbitrary names as identifiers is useful. If we don't have the extended identifier syntax, then tools would have to do their best to synthesize meaningful identifiers for names that are not also valid identifiers. |
I would also cast my vote in favor of extended identifiers. Synthesized identifiers are (subjectively to me) much harder to read than |
Okay, thanks, I'll move ahead with merging the PR then. |
In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see WebAssembly/spec#617 and WebAssembly/annotations#21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.
The utility of the
@name
annotation comes from being able to use more than the restricted set of ASCII characters allowed in the identifiers already present in the text format. But it's kind of awkward to have both an identifier and a name annotation in the text format when only one can appear in the binary name section. The status quo where you can have arbitrary strings in the name section but cannot represent them as identifiers is also awkward. Neither allow full bidirectional round-tripping of identifiers and names.I propose that instead of adding a @name annotation, we expand the grammar for identifiers to allow arbitrary strings. With the extended name section proposal, this would allow full bidirectional round tripping of identifiers/names.
The specific grammar for identifiers could be:
There would be no ambiguity in this grammar because strings always start with
"
, which is not included inidchar
.If this is the direction we decide to go in, I would suggest that the extension of the grammar be part of the extended name section proposal. cc @ashleynh
The text was updated successfully, but these errors were encountered: