Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TemplateLiteral support to JSXAttribute. #132

Closed
wants to merge 4 commits into from

Conversation

tolmasky
Copy link

@tolmasky tolmasky commented Dec 1, 2021

Given the comments in this Babel thread, here is my attempt at resolving the template string issue. This PR supports the "decode strings in the normal JavaScript manner", and my reasoning is below (there doesn't really seem to be a place to put the reasoning in the spec itself, although I have tried to clarify the encoding issues for future spec readers):

  1. As stated above, this spec change has template string attributes behave identically to their JavaScript counter-parts with regard to string encoding. In other words, <tag name = `[value]`/> is identical to writing <tag name = { `[value]` } />.
  2. The main reasoning here is that the primary purpose of having <tag name = "[value]"/> use the HTML-style string encoding is because there is a matching syntax production in HTML that we want to emulate. As a simple example, copying <tag name = "&amp;"> should not produce two surprisingly different results depending on whether it appears in HTML or JSX.
  3. However, there is no such matching production in HTML that we are attempting to support with our template strings. Short of having the template string incorporate the delimiting back-ticks and ignore any internal template elements, it will never match the expected behavior of HTML. (That is to say, in HTML <tag name = `[value]`> is like writing <tag name = "`[value]`">, with the additional complications since you don't have quote boundaries to clearly delimiter when the value ends, etc.).
  4. Template strings that provide a good escape hatch from the potentially surprising html entity encoding process to give you normal JS encoding. When used without any internal template elements, you essentially get JSX attributes that encode the same as JavaScript without the additional curly brace noise.

As @sebmarkbage mentioned in the above comment, it appears the sentiment is that we wish JSX simply used JS encoding across the board, so given that template strings are a new production that pose no backwards compatibility issues, it seems like a good opportunity to introduce a way to get this behavior without introducing breaking changes. If this is a direction we want JSX to go more generally, we also would have the opportunity to start telling people to use template strings across the board as a best practice, to clearly communicate in their code that they are using a JS version of strings and not an HTML one. Arguably, you could someday even deprecate double and single quoted attributes in favor of these and have a similar "JSX is not HTML" message when used to how it currently says "JSX is not XML" when namespaced tags are used. All these options are of course outside the scope of this one addition, but I bring them up to demonstrate that this simple addition opens up a lot of flexibility for creating smoother transitional paths to more breaking changes that have been discussed here for years.

NOTES

I've made a few other changes here to hopefully help people in the future:

  1. I've linked Syntax productions to their ECMAScript pages.
  2. I've tried to do the same with ESTree node elements (although outside the code block since you can't link there).
  3. I've attempted to explicitly note the HTML attribute encoding for JSXSingleStringCharacter and JSXDoubleStringCharacter as this appears to currently be absent from the spec (please correct me if I'm wrong here). This also through omission implies that the TemplateLiteral syntax production behaves identically to the ECMAScript one.
  4. I updated the @babel/parser link.

Closes #25
Closes #112

@tolmasky tolmasky changed the title First pass at adding TemplateLiteral support to JSXAttribute. Add TemplateLiteral support to JSXAttribute. Dec 1, 2021
@sebmarkbage
Copy link
Contributor

This is a somewhat unique case in that it's not a breaking change by itself. It would be a breaking change if everyone also switched the encoding.

We don't actually have to make that breaking change but say that the intention is to make the change at some point in the future when there's an appropriate time to version or introduce modes (like strict mode was). In that case, this is just jumping ahead and then it wouldn't be a breaking change later neither. In other words, if we declare the intention, we don't have to rewind this later on.

In that case I think it would be ok to land this.

@DanielRosenwasser is there anyone on the TypeScript side that has opinions about this?

@DanielRosenwasser
Copy link

DanielRosenwasser commented Dec 1, 2021

I have to discuss with our team - but could you update the title to reflect the change in encoding? Otherwise, it isn't obvious that this goes beyond adding template strings.

@tolmasky
Copy link
Author

tolmasky commented Dec 1, 2021

Hi @DanielRosenwasser,

Just wanting to clarify here, to make sure I understand the request and make certain that there's no misunderstanding (including on my part!). This PR is specifically not changing the encoding of any existing JSX syntax construct. This PR does however include documentation of the existing use of HTML encoding in JSX attributes that was previously not documented in the spec however.

The only actual goal of this PR is to add template strings as supported values to JSX attributes. It was asked in the thread what encoding these should have, simply because double-quote and single-quote strings have this (undocumented) subtle behavior that diverges from JS proper, and this PR just answers the question that this specific element requires no such divergence and behaves identically to having placed a template string in curly braces. Does that make sense? I'm happy to add stuff to the title, but I don't want to confuse anyone into thinking I am changing encodings on existing features or anything like that, the entire discussion around encodings is just a meta-question that was asked about this, and that should also be included in the documentation so that this confusion doesn't come up again in the future.

@tolmasky
Copy link
Author

tolmasky commented Dec 1, 2021

Also, don't know if the title change question was directed at me or not. If it wasn't, then no worries and I'm sure @sebmarkbage knows the right title change to make!

@DanielRosenwasser
Copy link

Hi @tolmasky, thanks for clarifying! We just discussed the issue at our design meeting, and I now understand that this is a clarification of the intended behavior for how <el foo="&emdash;" /> and the like are interpreted (and I guess TypeScript diverges from the intended behavior).

Taking a little more time to read the other issue, it seems like aligning with JS string semantics would have been desirable. Given the complexity of supporting another way to interpret string characters, and the refactoring hazard between <el foo="&emdash;" /> and <el foo={"&emdash;"} />, I feel the same. I wonder if other implementations are also in a similar spot.

I'll bring this back up with others.

@nicolo-ribaudo
Copy link

nicolo-ribaudo commented Dec 1, 2021

Babel currently support HTML entities in attributes, which makes it easier to copy-paste from HTML to JSX. However, we are not opposed to change it an align with JS strings: we will have the next major in the next few months, and we could introduce this breaking change.

I agree that template strings should align with JS semantics: since HTML doesn't support JS templates the difference will never cause copy-paste problems. However, I think we should work with the eslint-plugin-react authors to introduce a warning when using HTML entities in template attributes (and potentially also in "/'-style attributes).

Given the difference between TS and Babel, I don't think that this PR should mention the WHATWG encoding and we can just consider it "compiler-dependent behavior".

@RyanCavanaugh
Copy link

Babel currently support HTML entities in attributes

Is this an opt-in? In the repl today I see it transform <div title="&dash;"></div> to React.createElement("div", { title: "&dash;" });

@nicolo-ribaudo
Copy link

Uh well, &dash is missing: https://github.com/babel/babel/blob/main/packages/babel-parser/src/plugins/jsx/xhtml.js

@RyanCavanaugh
Copy link

LOL, I should buy a lottery ticket 👍

@DanielRosenwasser
Copy link

DanielRosenwasser commented Dec 2, 2021

@nicolo-ribaudo
Copy link

nicolo-ribaudo commented Dec 2, 2021

TS supports HTML entities too, and it we are not the only ones who missed &dash; 😛
https://www.typescriptlang.org/play?#code/DwEwlgbgBAhgvAIgGQBsAuBuJIYGcAWGCUA9AHxA

Jokes aside, it looks like both Babel and TS support HTML 4 but not HTML 5.

@RyanCavanaugh
Copy link

Can't believe dash didn't make the cut for HTML4. Anyway the complete list of HTML5 entities seems untenably large for every JSX transformer to include as a lookup table (it appears to be possibly thousands of entries long?), and it also seems awkward to say "HTML4 entities are encoded for convenience but not any higher", or "Entity transformation is a host-dependent function".

We should find some way to spec something that Babel and TS can both agree on, even if it's just a hardcoded list of ~223 entries. I don't want people's display changing based on which transformer they're using.

@Me1000
Copy link

Me1000 commented Dec 2, 2021

The whatwg hosts a json file of all the HTML5 entities: https://html.spec.whatwg.org/entities.json
It's 2231 entries (some of which are duplicative; e.g. &AMP;, &AMP, &amp;, and &amp), which honestly doesn't seem that terrible, especially considering the working group has said the list will not be expanded: https://github.com/whatwg/html/blob/main/FAQ.md#html-should-add-more-named-character-references

From a developer's perspective I'd find it quite odd if I tried to use an html5 entity and it wasn't working, only to learn that JSX only supports HTML 4 entities.

@tolmasky
Copy link
Author

tolmasky commented Dec 2, 2021

Should we perhaps split out discussion on the entities and so forth to a separate issue as its somewhat orthogonal to the template string stuff, and arguably more subtle as it also means things like not resolving JavaScript unicode escape sequences, etc.

@tolmasky
Copy link
Author

tolmasky commented Dec 4, 2021

Just wanted to follow up and confirm that there's nothing blocking this PR and that remaining open issues seem to be in existing syntax.

@nicolo-ribaudo
Copy link

I agree that the possible breaking change regarding "/' attributes is separate from this one. If I had to implement this in Babel behind a flag (while this is still a "proposal"), I'd start by throwing for HTML-entities used in template-like attributes so that then we can relax the restriction in either direction if we decide to align them with "/'-style attributes.

@tolmasky
Copy link
Author

tolmasky commented Dec 12, 2021

Just wanted to check in again as it's been a week, any reason not to merge this in?

@tolmasky
Copy link
Author

Just checking in again. Are there any blockers here? Happy to work on anything left to do.

@sebmarkbage
Copy link
Contributor

sebmarkbage commented Dec 20, 2021

Given that I'm leaving Facebook/Meta, we're still figuring out merge rights in this repo. Let's pick back up after the holidays.

@nicolo-ribaudo
Copy link

@tolmasky If you want, you could start working on a draft PR for @babel/parser, that we can merge when this is merged.

@tolmasky
Copy link
Author

tolmasky commented Jan 4, 2022

I hope everyone had a good new year! I am hoping to pick this back up now so we don't lose the thread on this. If I recall correctly, there were no outstanding issue with this PR and we simply wanted to wait until the holidays were over. Are we ready to go now?

@tolmasky
Copy link
Author

Just checking in again. My fear is that this is getting punted to whoever is coming onboard next and this will have to be relitigated all over again. Given that we've already determined this is a safe backward and forward-compatible change, and the unfortunate 7 year history of this request, I'm hoping we can avoid delaying this further.

@tolmasky
Copy link
Author

@tolmasky I understand your frustration with the long, arduous process. I feel you’re reading my comment incorrectly: I said this thread was all-over-the-place, not your contributions. To clarify: a) I did not mean to cause you grieve, b) I am not involved in this process, other than an outsider that’s interested in seeing JSX improve. I never said that your notes, that you removed, or links, that you now removed, should not land. I proposed to split them exactly because I do think they should land. And that separating them might make everything easier to discuss. I don’t think anyone is against your links. But separating them, makes this tough nut easier to grasp and discuss!

Hi @wooorm, I appreciate the reply and sorry if my tone was harsh with you. My actual frustration is around the kremlinology around the right ways to poke this PR in order to get it accepted. Especially because I think these links are actually very useful in reading and understanding that part of the spec. I certainly had another tab open with each one to double check that there wasn't any conflict, etc. And I understand that you simply think it should be split out. Perhaps in a repo with a different history I'd agree (well, weird hypothetical because such a repo would have just accepted the helpful links in this PR too, but I digress), but given this repo, I just don't think it will do anything other than create 2 PRs for me to track for weeks. Also, @Huxpro can just add them if he thinks that subset of changes is worthwhile and the rest needs further consideration. It's ~10 lines, he could do it with the built-in GitHub editor, I don't care who gets the credit for the commit, I just want the next person who comes to be able to click on a link.

@acdlite
Copy link

acdlite commented Feb 10, 2022

Hi @tolmasky, I think your frustrations are legitimate, just want to chime in here with a few bits of context:

  • Although @sebmarkbage is no longer with Meta/Facebook, he's still an active contributor to React and we consult with him regularly. We discussed this topic today during one of our team check ins.
  • Please try not to take out your (again, legitimate) frustrations on @Huxpro. He's new-ish to our team and has only recently been given ownership over this repo.
  • What you said here seems totally fair to me:

[...] I'd be fine with:

  • Closing the bug and flat out saying you don't like it and that's the end of it.
  • Making a specific ask or providing specific requirements as to what would convince you.
  • Saying that all changes are on hold until you come up with contributing guidelines (look, I'm giving you easy ways to get out of this!)
  • Admitting that you're new to the repo and that you're just not ready to make this kind of change any time soon to a repo that has a history of rarely changing.

We'll give a more definitive response soon. I'm sorry we haven't been as transparent or responsive as we could have been.

@tolmasky
Copy link
Author

Hi @acdlite, thanks so much for the update. That sounds good to me!

@magic-akari
Copy link

magic-akari commented Feb 11, 2022

I am new to the repo.
It looks like there are some non-technical issues being discussed here.


Please allow me to ask some more details on compatibility and inconsistency.

<div foo="&amp;" bar=`&amp;` baz={`&amp;`} />

This means that in some hypothetical future version of JSX2.0. We'll get a breaking change but keep the consistency version. Right?

React.createElement("div", {
  foo: "&amp;",
  bar: `&amp;`
  baz: `&amp;`
});

All foo bar and baz will get the same vlaue &amp;.
Although foo will get different results in the existing behavior. But since this is a breaking change and it keeps consistency, we are happy to accept it.

Personally, I prefer JS string encoding, HTML entity are annoying and confusing. But the inconsistency is more unacceptable.


For some time after this PR will be merged before the upcoming release of JSX2.0, we will get a non-breaking change but inconsistent behavior. Right?

React.createElement("div", {
  foo: "&",
  bar: `&amp;`
  baz: `&amp;`
});

The behavior of bar is the same as baz's, but different from bar's. That's what I'm concerned about.

Imagine a theoretical situation.

<div>
  <MySpan content={`&amp; Interpreted as ${userSelectedNote}`} />
</div>

Well, we are now ready to use the shiny JSX TemplateLiteral!

<div>
  <MySpan content=`&amp; Interpreted as ${userSelectedNote}` />
</div>

Hey! we want to color the userSelectedNote.

<div>
  <MySpan content=`&amp; Interpreted as ` />
  <MySpan color={themeColor} content={userSelectedNote} />
</div>

Since there is no substitutions in TemplateLiteral, let's switch it back to a normal string!
Maybe some formatter or linter well do it automatically.

<div>
  <MySpan content="&amp; Interpreted as " />
  <MySpan color={themeColor} content={userSelectedNote} />
</div>

And what happened now? I got unexpected results.


There is thus some sense that if JSX had originally been released 2 years later, after TemplateLiterals had already existed, that it would have just supported all three string literal types.

If that were true, we would get three consistent strings, since TemplateLiterals does not exist in HTML String and we choose JS string encoding.

Simply patching it with template string and leaving the other two strings alone will not take us back in time. We've gone into a different route, and the fix comes at a cost. The inconsistency was exacerbated by the introduction of TemplateLiterals.

Therefore I do not agree that this change is orthogonal to the entity problem.

What concerns me even more is if this PR is merged but JSX2.0 never arrive. Then we have to live with this inconsistency forever.

@Kingwl
Copy link

Kingwl commented Feb 11, 2022

Hi @tolmasky. Thanks for your effort here.

My point of view from the user's perspective is (Opinions are my own):

  1. Is the motivation clear? Yes. 1. To avoid unnecessary JSXExpression ({}). 2. To avoid html4 entries.

  2. Is this useful? Yes, nice to have. It's could avoid some effort for me (eventhough not too much).

  3. Is this a breaking change ? Nope. At least not breaking in syntax. It's Illegal in most of common parsing tools for now.

  4. Is this orthogonal with existed jsx string's entries issue ? Yes. It's a new syntax, Personally, I'm ok with new syntax has new semantic (or behavior).

  5. Should we change the existed jsx string's entries? Nope. It's a breaking change.

  6. Why just template literal? why not boolean? numeric? It's hard to answer:

    1. As a user, I am happy to see these too.
    2. As someone who tries to push something: I will keep the scope to a minimum so that I can get more consensus with everyone.
  7. Does anyone actually use entries inside jsx attribute values in real world? Honestly, It's (In this issue) my first time to know there's entries in string jsx attribute value.

    1. And I asked some colleagues around me, they also do not know or have not used this. To clearify: This is not a basis for anything, nor does it make any constructive sense.

IMO, the problem you need to convince someone of is: Should we continue to block other literal expressions if template literal has merged in.
And also: why now?

@tolmasky
Copy link
Author

tolmasky commented Feb 11, 2022

Hi @magic-akari, thanks for the question, scenarios like this are very helpful for thinking about this. As I'll show below, there is zero additional inconsistency from today's behavior. Let's go through your exact steps, only using the current JSX implementation. Let's begin with exactly where you started:

Imagine a theoretical situation.

<div>
  <MySpan content={`&amp; Interpreted as ${userSelectedNote}`} />
</div>

To be perfectly clear, the user would see &amp; in their rendering, not "&", both with today's JSX, and if this PR is accepted. Now, since we don't currently have a shiny new TemplateLiteral, we never go through the simplification step. But, just like before, eventually we want to color the note, so we break up the previous template expression above into two elements:

Hey! we want to color the userSelectedNote.

<div>
  <MySpan content={`&amp; Interpreted as `} />
  <MySpan color={themeColor} content={userSelectedNote} />
</div>

And, just like in your example, there's no longer any replacement in the TemplateLiteral, so we'll switch back to the normal string! Maybe some formatter or linter will do it automatically.

<div>
  <MySpan content="&amp; Interpreted as " />
  <MySpan color={themeColor} content={userSelectedNote} />
</div>

And what happened now? I got non-expected results... today. You can't tell me that this would be expected by your average JSX user, especially considering that this is an undocumented feature. I agree, it sucks... but the unexpectedness of this behavior has nothing to do with whether or not we remove the need for wrapping the template literal in an additional set of curly braces.

There is thus some sense that if JSX had originally been released 2 years later, after TemplateLiterals had already existed, that it would have just supported all three string literal types.

If that were true, we would get three consistent strings, since TemplateLiterals does not exist in HTML String and we choose JS string encoding.

No, there's a fairly good chance they would have the two sets of differing behavior, if you look at the original reasoning behind including HTML entities in the first place: single and double quoted strings appear in existing HTML, so the entire reason this HTML entities feature (that no one seems to like) exists is to preserve those single and double quoted strings if you half-haphazardly copy and paste them into your JSX. To be clear, since then, we've discovered that this is not really as important a use case as it was originally considered to be, especially considering that the "dream" of copy-pasting HTML never really manifested since things like class= and style= never worked. But, again, as an exercise of putting ourselves in their mindset at the time, there would have been this same reason for double and single-quoted strings, but not for template strings because there's no large existing plethora of backticked-strings out in HTML that would be nice to copy over to JSX. Additionally, HTML entities really can't work in template strings (consider for example the fact that <div value = `&copy;${"&copy"}`> would have to generate ©&copy;, if this were the case). As such, there's a pretty good chance that the compromise of "html entities in html strings, normal JS strings for template literals" would have been reached as the ultimate solution. And arguably, this would have been better for the same reason I outlined above: it would allow you to message to people to "just stick to template strings to avoid this whole entity mess".

What concerns me even more is if this PR is merged but JSX2.0 never arrive. Then we have to live with this inconsistency forever.

I will absolutely grant that if we merge this it will exist for a long time before JSX 2.0, because JSX 2.0 is probably never arriving (not unless Facebook is secretly working on unrelated an unannounced changes). You can look at this dead thread here where 2.0 was originally discussed 8 years ago. That being said, as I think I displayed above, this doesn't actually add any particular additional confusion IMO.

@Kingwl
Copy link

Kingwl commented Feb 11, 2022

Hi @magic-akari , Thanks for your feedback.

But the inconsistency is more unacceptable.

It's the same situation between template literal and string literal, eg:

const stringLiteral = 'line
feed' // Illegal
const noSubstitutionTemplateLiteral = `line
feed` // line\nfeed42
const templateLiteral = `line
feed${42}` // line\nfeed42

(And In my memory, there's some minor different for escape sequence too. Sorry I'm not pretty sure.)

They are all something like strings, but why they are inconsistency? because they are different concept even they looks same in most case.

It's also my point with this issue: they are different concept.

If that were true, we would get three consistent strings, since TemplateLiterals does not exist in HTML String and we choose JS string encoding.

To be clear: JSX does never claim the HTML compatibility.

This specification does not attempt to comply with any XML or HTML specification. JSX is designed as an ECMAScript feature and the similarity to XML is only for familiarity.

And also means: There's always JS.

The inconsistency was exacerbated by the introduction of TemplateLiterals.

It's the same as the inconsistent between string literals and the template literals. Put another way, this is what we(this PR) want .

And also a problem same as template literals: Why that inconsistent is acceptable but not this one?

@tolmasky

because JSX 2.0 is probably never arriving

lol.

@magic-akari
Copy link

I am the supporter of JS encoding string. I subconsciously treat JSX as sugar of JS.

So it surprises me when the following code behaves differently.

const foo = "&amp;";

<div foo={foo} bar="&amp;" />;

And the supporter of html encoding string well tell me "It follows the html encoding rules. Use {} to get back to the world of JS."
Okay, okay, it's logical and self-consistent, I'll just have to accept it.


Everything will change if this pull request is merged.

Hey, look at this! It behaves consistent, whatever the literial string is in or out the JSX.
Don't tell me to use {} to get back to the js world.
We are use the same string syantax in JS and it is JS encoding string as we expect.

const foo = "&amp;";

<div foo={foo} bar=`&amp;` />;

So, Why not change the existed jsx string's?
Sure, it is a breaking change.
But this pull request destroys the arguments of html encoding supporters.

@magic-akari
Copy link

@Kingwl

It's the same situation between template literal and string literal

No, they use a different syntax that is more easily noticeable.

The JSX string uses the same syntax as the JS string. The difference is in their contexts which is hard to notice.

The normal string with multiple lines are handy. I would be happy to have it in JS. But it is another case.

@Kingwl
Copy link

Kingwl commented Feb 11, 2022

@magic-akari Thanks for the response

Okay, okay, it's logical and self-consistent, I'll just have to accept it.

IMO, this is a concept that to help understanding the behavior. But Is that a spec or design goals? I'm not sure.

In other words, can this concept what the supporter of html encoding string told you be used as a rule for our future designs?

I'll just have to accept it.

I'm not a big fan about that too. we might just accept it because it's already became reality.

Everything will change if this pull request is merged.

Well. I don't think so. Nothing changed if you are not using this new syntax.
The difference will only happend If you are using this template literal.

So, Why not change the existed jsx string's?

As you said. it is a breaking change. It's might happend after jsx 2.0 which will probably never arriving.

The JSX string uses the same syntax as the JS string. The difference is in their contexts which is hard to notice.

It's the problem of existed jsx string. Not what we are talking about in this issue.
And that is what this proposal want which has designed to limit the scope to avoid the conflict and get more consensus.

BTW: noticeable or inconsistency is not a boolean. It's not only inconsistency or consistency. It's actually a bar that like I think this one is more consistency than that.

IMO It' might hard to be a strong blocker.

@Huxpro
Copy link
Contributor

Huxpro commented Feb 11, 2022

@Kingwl Just wanna share some technical fun facts.

Quoted strings in JSX behave differently with either JS or HTML: It supports multi-line (which JS string literal don't), but the line breaks are always trimmed to a single whitespace (which HTML attribute values don't). I haven't found the historical conversation yet.

(And In my memory, there's some minor different for escape sequence too. Sorry I'm not pretty sure.)

And from what I knew, the string literal and untagged template literal have the same escape sequence. Notably, all string literals and template literal refer to the same Escape Sequence back in ES2017. However, a template literal revision is later amended to ES2018 to relax the escape sequence restriction of only the tagged template literal to better support DSL usages.

@Kingwl
Copy link

Kingwl commented Feb 11, 2022

@Huxpro Thanks for the sharing.

only the tagged template literal

Oh yes. That's it. Thanks.

@wooorm
Copy link

wooorm commented Feb 11, 2022

but the line breaks are always trimmed to a single whitespace (which HTML attribute values don't). I haven't found the historical conversation yet.
@Huxpro

I mentioned it here: #132 (comment). Babel trims. TypeScript doesn’t. So It could be considered a bug in Babel that they trim.

As I noted in that comment though, it could be useful to specify this trimming: it would allow Prettier and other tools to improve the printing of attributes, because it can dedent and indent to make things look good while it doesn’t make a syntactical difference.


Addendum: Note that the behavior of character references and whitespace “originate” from JSXText:

<a b="c &amp; 
      d">
  e &amp; 
     f
</a>

Babel:

React.createElement("a", {b: "c &  d"}, "e & f")

TS:

React.createElement("a", { b: "c & \n      d" }, "e & f")

And that “braces can be used to enter JavaScript” if you want whitespace / character references:

<a>{`
  e &amp; 
     f
`}</a>

->

React.createElement("a", null, `
  e &amp; 
     f
`)

I think it makes a lot of sense that double/single quoted attributes work the same as text inside elements: that whitespace is trimmed and that character references work.

@tolmasky
Copy link
Author

tolmasky commented Feb 11, 2022

Everything will change if this pull request is merged.

That's not true, the amount of "confusion" will remain the same. In fact, I will argue that the amount of accidental surprising behavior will decrease. Why? Because you are less likely to manually convert <div value = `&amp;`> to <div value = "&amp;"> than you are to convert <div value = { `&amp;` }> to <div value = "&amp;">. There's no reason to convert `&amp;` to "&amp;". It's not like non-replacement template strings are "more expensive" in JS. It would be a purely a stylistic choice at that point, and if you had already been using replacements beforehand, there's a good reason to just keep it as a template string in case you want to use replacements in the future. You pay no cost. However, it does seem like a bunch of unnecessary extra text to see { `&amp` }, so you'd probably be even more inclined to make some change to it, leading to the dreaded "&amp;".

So, given the fact that the entity feature is completely undocumented, this PR would lead to equivalent "opportunity" for confusion to before, but there is a strong case for it resulting in *fewer& instances of that confusion.

So, Why not change the existed jsx string's?

This PR allows you to develop a linter rule that enforces always using template literals to get this behavior today without it being a breaking change. If you always use template literals you can completely escape the whole entity mess, and it's the only way to get this anytime soon. It is probably the only way to get this behavior in the next 5 years.

Please take a moment to read through the existing issues and PRs. The choice is not between this or a great JSX 2.0. The choice is this or nothing. I wish we could change all strings too! So does the original maintainer. We've been wishing it for 5 years. It's not going to happen. This repository can't even come together to make the completely uncontroversial decision to at least document the existing behavior to spare its poor users the pain of having to discover this feature on their own. The oldest open issue is 8 years old and just asks about documenting this existing behavior. But like clockwork, every issue or PR in this repository devolves into fantasies about what some new JSX might look like, and eventually everyone has to move on with their lives and forgets about the issue and it becomes abandoned. This is not an exaggeration, just look through the issues in this repository. It's OK to not like the template literals proposal, but don't have the reason be the mistaken belief that we're going to get "the real" fix instead. We're not.

@magic-akari
Copy link

The choice is not between this or a great JSX 2.0. The choice is this or nothing.

<div foo="&amp;" bar=`&amp;` baz={"&amp;"} />

Great JSX2.0(probably never arriving)

React.createElement("div", {
  foo: "&amp;",
  bar: `&amp;`,
  baz: "&amp;"
});

Nothing

React.createElement("div", {
  foo: "&",
  // no bar
  baz: "&amp;",
});

TemplateLiterial Patch

React.createElement("div", {
  foo: "&",
  bar: `&amp;`,
  baz: "&amp;"
});

Honestly speaking, if I can't choose Great JSX 2.0, I choose Nothing.

@tolmasky
Copy link
Author

Since we agree we're not discussing JSX 2.0 @magic-akari (which for the record, even if we did get it, would not necessarily do anything with HTML entities, that in itself is also an assumption, so it's 2 assumptions), let's stop adding it to our comparisons. Let's stick to the two concrete examples we know, and not insert a third fantasy example:

Nothing

React.createElement("div", {
  foo: "&",
  // no bar
  baz: "&amp;",
});

TemplateLiterial Patch

React.createElement("div", {
  foo: "&",
  bar: `&amp;`,
  baz: "&amp;"
});

They are equally "inconsistent". You may prefer one over the other, but no one is forcing you to use template literals, so I don't see the problem. I can appreciate the abstract concern you have with it, but if we talk about the practical implications for users, it is simply not realistic that this introduces "more instances" of running into the HTML entity disparity. Users are not currently foregoing from using template literals because they can't inline them directly, and as such, whenever they switch from a template literal to a normal string they run into the exact scenario you are concerned with.

As I mentioned in my previous comment, the best way to get a "no entities purely consistent" behavior is to accept this PR and then use a linter to enforce only using template literals:

<div value = "hi"> // error, use <div value = `hi`> instead to avoid accidentally HTML entity handling

This is what I reference in my original summary of providing a "best practices" path today to start transitioning the community to a no-html-entities future. It would be far more likely to succeed than trying to convince everyone to wrap all their non-template-strings in curly braces, which seems like a lot of boiler plate for nothing.

@Jack-Works
Copy link
Contributor

I support never converting HTML entity encoding in the property field for the following reason.
Those reasons are based on my personal experience.

  1. I believe most people don't know this feature.
  2. I believe people don't copy-paste HTML into JSX.
  3. I believe most people think JSX is closer to JavaScript than HTML/XML.
  4. It is not impossible to make a breaking change and migrate.

[2]: Please be aware of htmlFor, className and self-closing HTML (<img src="">). They cannot be pasted into JSX without modifying.

[3]: <></> or <Namespace.Component /> is far away from HTML/XML. XML-styled JSX <namespace:component /> is rarely used. (babel throws by default when using XML-style namespace JSX IIRC).

And new programmers to the front-end are almost always starting from modern frameworks.

[4]: Since the following code will not have entity encoding, that means this breaking change is statically analyzable and auto-fixable. I believe it's easy to run a migrate script on the codebase and fix all use cases.

Also, there is no problem with the ecosystem because library authors ship compiled JSX, not the JSX source. Therefore upgrading the toolchain in the application won't affect the interpretation of the dependencies. (Thus this problem is different from define/set on the class fields).

return <img alt={"&" + props.alt} />

I think maybe it's OK to have HTML entities encoding in the JSXText.

@tolmasky
Copy link
Author

tolmasky commented Feb 11, 2022

I support never converting HTML entity encoding in the property field for the following reason. Those reasons are based on my personal experience.

Hi @Jack-Works, I am begging us to not derail this. EVERYONE agrees with your position. NO ONE LIKES the HTML entity thing. Not even that, no one even fully understands it, since it's not properly documented and is only a subset of HTML entities. The original maintainer wishes he had never added it. You are 100% correct that we should get rid of it -- but we can't do that without shipping a breaking version, and this project has a hard enough time shipping backwards compatible features as it is. Scratch that, this project can't even get consensus about shipping updates to the documentation that would at least document the current behavior.

If everyone agrees that HTML entities are bad, then the only question would be "why add them in more places?" That's what's trying to be decided here, should template literals continue this behavior few people know about and no one likes once they find out about, only now in a feature that can't even plausibly offer a reason to do it like the original one did? No. Of course not. Let's add this simple feature and make it follow the pattern JS instead of HTML, such that <div value = `template`> is consistent with <div value = { `template` } >, and such that we can offer a transitional path to getting rid of HTML entities everywhere someday.

@tolmasky
Copy link
Author

Hi @Huxpro, @sebmarkbage, and @acdlite, this PR has now reached its "final form" of JSX PRs and GitHub issues. Within this very thread, we have had multiple people come in and say "I have a great idea, why not just remove HTML entities everywhere?" It's clear that it has reached sufficient length where no one is reading any of the preceding comments anymore, and I don't have the energy to repeatedly explain the same answer over and over. Scratch that, to explain the same question we are trying to answer over and over again. Please feel free to close the PR. I've been using my own private fork with template literals for a few months now, so this PR wouldn't really provide me with anything, I simply thought it might be nice to be able to offer this to people who don't have the time or expertise to manually edit Babel themselves. I'd close the PR myself, but given that there are other people that had some vested interest in it I consider it rude to do that, but don't mind (and encourage) you to do it. I think it might be most fitting for @sebmarkbage to come and close it actually. I wish he had just closed it to begin with. I know @acdlite said you guys are working on a proper response, but please don't feel pressured to do this for my sake anymore. This has all been a colossal waste of time, and I apologize for not heeding the warnings given to me originally, and I do want to sincerely apologize to you @Huxpro for having to deal with this as your initial introduction into Open Source. I do wish we would have gotten to meet under different circumstances, and I certainly harbor no ill will towards you at all and hope you also don't towards me. This repo is just cursed.

@wooorm
Copy link

wooorm commented Feb 11, 2022

As I mentioned in my previous comment, the best way to get a "no entities purely consistent" behavior is to accept this PR and then use a linter to enforce only using template literals:

I see this as bad. If such a rule would land in eslint-config-airbnb, eslint-plugin-react, etc, tons of newcomers would change what I see as readable JSX (double quotes) into backticks. Just to appease the tools. I don’t see it as “better”.


[2]: Please be aware of htmlFor, className [...]

This is a React-specific choice. They’re not dictated by JSX. Preact for one, but also others, don’t have this rule.

[4]: that means this breaking change is statically analyzable and auto-fixable.

How could tools differentiate between these: <a b="MIT &copy; Some Person" title="you can type &copy; in HTML to get a copyright sign" />). Should they both turn into a copyright sign?


If everyone agrees that HTML entities are bad

I personally don’t think they’re bad.

I’m 👍 on character references in double/single quoted attribute values. Because they also occur in JSXText.
I’m open to removing them if they’d be removed from JSXText too. However, those are used a lot in the wild (e.g. Google “JSX nbsp”).

Removing them from one but not the other, IMO, makes it harder to explain JSX. JSX isn’t pretty, but consistency helps.
I think removing character references from both is acceptable, but it makes it impossible to use both double and single quotes inside an attribute value, unless JS escapes (\") were added too. Allowing that, and in JSXText, would also break real content, unfortunately.

For whitespace in attribute values, I’d prefer that to follow Babel, which is the same as JSXText. Due to consistency and to help formatters.

Broadly, I hope that JSX remains a small language that has JS as an “escape hatch”. That it’s a small weird mix between HTML and JS, and with braces you get actual JS.
I can see some value in a big JSX 2.0 rewrite. But it may not happen. I think I’m against adding backticked strings with JS semantics to JSX as currently though, as its inconsistent (I don’t have a vote though, this is a Facebook project, but I as the MDX maintainer see this as a net-negative, it’s just an opinion).

@dantman
Copy link

dantman commented Feb 11, 2022

[4]: that means this breaking change is statically analyzable and auto-fixable.

How could tools differentiate between these: <a b="MIT &copy; Some Person" title="you can type &copy; in HTML to get a copyright sign" />). Should they both turn into a copyright sign?

Yes, both of them should turn into a copyright sign.

As you can see from the transpiled code, the html entity -> unicode character conversion is done by the JSX compiler. It's not done at runtime with any difference between types of properties. ALL of them turn into (escape sequences) unicode characters before any code is run.
Babel playground
TypeScript playground

@Huxpro
Copy link
Contributor

Huxpro commented Feb 25, 2022

First, I want to thank everyone that has been involved in this long conversation. I see that this change (and many other issues and PRs) had been debated for a long time and everyone feels more or less frustrated about the status quo of this repo. So my primary goal as the new repo maintainer is to make this repo healthy again ❤️. I’ll elaborate more on the “next step” section below.

This Proposal

The team understand this proposal is about adding template literal to JSX orthogonally with the original HTML-ish encoding of other JSX “strings” and this proposal alone is not a breaking change.

The team also shared many concerns that had been raised up by many others in this thread (THANK YOU!), such as the tension between HTML vs JavaScript encodings, and whether or not we should push the boundaries to other expressions so we can deliver a more cohesive mindset. We also observed that a consensus can't be easily reached just yet.

Our conclusion therefore is that we are hesitant to move forward on making this change. It’s not so much about this proposal itself, but its downstream effects. We are not convinced that it’s worth the churn for existing users. We will still be interested in incorporating this proposal as part of a larger and cohesive changes such as JSX 2.0.

Next Step

Repo Health

I want to thank @tolmasky for his valuable callouts on how we could improve from this regard. I’ll start with drafting a set of new guidelines which should clarify the requirements and expectations on different kinds of contributions (e.g bug reports; editorial and normative changes to the current JSX spec; and feature requests and discussion that may go to JSX 2.0).

I’ll also start with tagging and fixing stale issues and PRs so they are properly attributed.

Moving Forward

To re-gain a solid foundation to constructively move JSX forward, we plan to correctly document all the existing semantics in the spec to align with existing implementations such as Babel and TypeScript. Going forward, we want to minimize the inconsistencies between different implementations, and ensure we are in the same page whenever a change is needed to be made to the spec.

To get the ball rolling, I’ve made 2 PRs:

  1. Migrate to ecmarkup #135 to revamp the spec. This is already merged. Check out http://facebook.github.io/jsx/ to see how the new spec website looks and feels!
  2. [Normative] Capture the HTML entity behaviors #136 . I encourage everyone that has been interested in the current HTML character reference behaviors to continue the discussion there.

Regarding other topics e.g. white spaces inconsistencies, please feel free to file another issue to continue the discussion there.

Thank you 💙

@Huxpro Huxpro closed this Feb 28, 2022
Huxpro added a commit that referenced this pull request Mar 8, 2022
As a follow-up of <#132 (comment)>, we should clarify the requirements and expectations on different kinds of contributions.
@Huxpro Huxpro mentioned this pull request Mar 8, 2022
Huxpro added a commit that referenced this pull request Mar 8, 2022
As a follow-up of <#132 (comment)>, we should clarify the requirements and expectations on different kinds of contributions.
@Kingwl
Copy link

Kingwl commented Mar 7, 2023

Any update after 1 year later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow template literal in JSXAttributeValue Add links to ECMA-262 entities