Should we allow more identifiers instead of having @name? #21

tlively · 2023-08-31T16:41:15Z

The utility of the @name annotation comes from being able to use more than the restricted set of ASCII characters allowed in the identifiers already present in the text format. But it's kind of awkward to have both an identifier and a name annotation in the text format when only one can appear in the binary name section. The status quo where you can have arbitrary strings in the name section but cannot represent them as identifiers is also awkward. Neither allow full bidirectional round-tripping of identifiers and names.

I propose that instead of adding a @name annotation, we expand the grammar for identifiers to allow arbitrary strings. With the extended name section proposal, this would allow full bidirectional round tripping of identifiers/names.

The specific grammar for identifiers could be:

id ::= '$' idchar+ | '$' string

There would be no ambiguity in this grammar because strings always start with ", which is not included in idchar.

If this is the direction we decide to go in, I would suggest that the extension of the grammar be part of the extended name section proposal. cc @ashleynh

The text was updated successfully, but these errors were encountered:

yuri91 · 2023-09-01T08:56:26Z

Just thinking out loud about the implementation of this in the new custom section support in the interpreter:

We can still use the current custom section handler for the name section, and for text parsing, instead of getting the names from @name annotations, get them from the identifiers in the text.

It is a bit annoying that the AST does not keep the names, so they need to be re-parsed from text, but it's doable.

rossberg · 2023-09-06T15:09:34Z

This seems fine, as long as there is no expectation that we are getting into any of the Unicode messes surrounding normalisation etc., i.e., for determining id equality, we are comparing just raw code points (which may or may not be what users might expect).

rossberg · 2023-09-06T15:39:40Z

PS: Also, a reminder that Unicode spoofing has become a real security concern. Allowing completely arbitrary Unicode strings in something as central and sensitive as identifiers is elevating that risk. It's hard to predict how relevant that is for Wasm, but I could imagine security-sensitive hand-written Wasm in the future, which makes me feel uncomfortable about this. Perhaps we should try to get the opinion of some knowledgeable security experts?

dzfrias · 2023-09-19T22:19:24Z

Just to jump into this, I've been working on an implementation of the @name annotation at WebAssembly/wabt#2297, and I think this new syntax is a good idea from both a grammar standpoint and I believe is just more intuitive in general.

Having a "binary-specific" name (from the @name annotation) detached from the actual identifier posed a few of questions while I was been working on the implementation:

Should @name be valid when there's no corresponding $name when writing an identifier for a function, for example? I'd assume not, or else the grammar would have to change to support @name as a substitute for $name
When writing names from the binary format to the text format (maybe an identifier that has spaces in it), how should it be represented in the $name form if both $name and @name are required?

Given these problems, I don't think @name helps very much when trying to roundtrip invalid names. I think the $"name" syntax is a much better idea, both for implementing a tool like wasm2wat and wat2wasm and for when writing WAT files in general.

alexcrichton · 2023-10-11T08:18:44Z

Currently in wasm-tools when a wasm file is printed any names of functions which are not valid identifiers cause a (@name ...) annotation to get printed along with a synthesized $... identifer that is as well valid. I personally find the idea here, being able to print $"...", is a great idea as it would help simplify this and make it easier to read as well.

One gotcha though is that the custom name section has no requirements on uniqueness of identifiers but the text format does have requirements on this. For example this module:

(module
  (func (@name "foo"))
  (func (@name "foo"))
)

is not valid to print with $foo (or $"foo") for both identifiers. This means that if the (@name ...) annotation were removed then I don't think there would be a remaining means to print a module like this in the text representation.

For comparison, currently wasm-tools print over the above module prints:

(module
  (type (;0;) (func))
  (func $foo (;0;) (type 0))
  (func $#func1<foo> (@name "foo") (;1;) (type 0))
)

where the second function gets @name to indicate its name in addition to a synthesized $-name.

tlively · 2023-10-12T09:39:22Z

Ahh, good point that we so far would not be able to round-trip binaries with repeated names in the name section. We could solve that by allowing repeated identifiers in the text format and using indices (with comments) rather than identifiers to disambiguate at the use sites.

In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see WebAssembly/spec#617 and WebAssembly/annotations#21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.

tlively · 2024-04-03T17:05:59Z

@rossberg, what do we need to make progress here? Given that the names section is used only as a debugging aid, I haven't heard anyone else voice concerns about unicode spoofing risks.

rossberg · 2024-04-04T07:52:36Z

I'm okay doing this change, but it requires some spec work. The easy part is to extend the definition of identifiers to

id ::= '$' idchar+ | '$' name

But that's far from enough. The text format spec currently relies on equality of identifiers being completely syntactic. If we allow strings, then syntax and denotation can differ, so we'll need to introduce equivalence classes, and need to define that elaboration. For example $"\41" is equal not just to itself but also to $"A" (and presumably $A).

bvisness · 2024-05-21T17:00:21Z

After today's CG discussion, I think it makes more sense to keep the @name annotation, since it doesn't seem like identifiers will ever be able to express the full range of what the name section allows.

I would also suggest that we shouldn't worry about round-tripping identifiers. It's not a bad thing to put identifiers in the name section, or to pull identifiers from the name section, but identifiers don't really correspond to anything in the source language and hence I don't think it's that critical to preserve them. (To that end, I don't think I see the value of quoted identifiers if also keeping @name.)

rossberg · 2024-05-21T17:10:20Z

I was about to merge #23, but I don't mind either way. They do add some complexity to the spec, o.t.o.h. it's handy to have an escape mechanism for richer Wasm identifiers.

How strongly does everybody feel?

tlively · 2024-05-21T17:12:00Z

I very much want the extended identifiers, even though we're also keeping the name annotation. For the common case where there are not multiple or repeated names, the names can match 1:1, so being able to use the arbitrary names as identifiers is useful. If we don't have the extended identifier syntax, then tools would have to do their best to synthesize meaningful identifiers for names that are not also valid identifiers.

alexcrichton · 2024-05-21T17:16:51Z

I would also cast my vote in favor of extended identifiers. Synthesized identifiers are (subjectively to me) much harder to read than $"foo" and not having to synthesize an identifier at all would be useful.

rossberg · 2024-05-21T17:29:34Z

Okay, thanks, I'll move ahead with merging the PR then.

In addition to normal identifiers, support parsing identifiers of the format `$"..."`. This format is not yet allowed by the standard, but it is a popular proposed extension (see WebAssembly/spec#617 and WebAssembly/annotations#21). Binaryen has historically allowed a similar format and has supported arbitrary non-standard identifier characters, so it's much easier to support this extended syntax than to fix everything to use the restricted standard syntax.

dzfrias mentioned this issue Sep 20, 2023

write invalid names using @name annotation WebAssembly/wabt#2297

Open

tlively mentioned this issue Feb 6, 2024

[Parser] Support string-style identifiers WebAssembly/binaryen#6278

Merged

This was referenced Apr 3, 2024

Name annotations vs arbitrary identifiers #1

Closed

Status and path to Phase 4 #18

Open

rossberg mentioned this issue Apr 4, 2024

Allow quoted identifiers #23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we allow more identifiers instead of having @name? #21

Should we allow more identifiers instead of having @name? #21

tlively commented Aug 31, 2023 •

edited

Loading

yuri91 commented Sep 1, 2023

rossberg commented Sep 6, 2023 •

edited

Loading

rossberg commented Sep 6, 2023

dzfrias commented Sep 19, 2023

alexcrichton commented Oct 11, 2023

tlively commented Oct 12, 2023

tlively commented Apr 3, 2024

rossberg commented Apr 4, 2024

bvisness commented May 21, 2024

rossberg commented May 21, 2024

tlively commented May 21, 2024

alexcrichton commented May 21, 2024

rossberg commented May 21, 2024

Should we allow more identifiers instead of having @name? #21

Should we allow more identifiers instead of having @name? #21

Comments

tlively commented Aug 31, 2023 • edited Loading

yuri91 commented Sep 1, 2023

rossberg commented Sep 6, 2023 • edited Loading

rossberg commented Sep 6, 2023

dzfrias commented Sep 19, 2023

alexcrichton commented Oct 11, 2023

tlively commented Oct 12, 2023

tlively commented Apr 3, 2024

rossberg commented Apr 4, 2024

bvisness commented May 21, 2024

rossberg commented May 21, 2024

tlively commented May 21, 2024

alexcrichton commented May 21, 2024

rossberg commented May 21, 2024

tlively commented Aug 31, 2023 •

edited

Loading

rossberg commented Sep 6, 2023 •

edited

Loading