-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common HTML named entity handled badly within JSX (.tsx) files #47030
Comments
Typescript appears to match Babel here: |
Interesting, yes, I see the original commit for Babel's src/plugins/jsx/xhtml.js 7(?) years ago, since then renamed to packages/babel-parser/src/plugins/jsx/xhtml.js The same content is found both there and src/compiler/transformers/jsx.ts. Same number of entities (253) and same ordering. One can assume one is derived from the other, or both are derived from the same third source. Ah, I think "from where" is solved. The list of entities contains all the named entities from HTML 4.01 with the addition of HTML 4.01 was ... a while ago. 2000? The number currently supported, 253, indicates the scale of the problem. There are ~2100+ HTML5 entity names for over 1500 separate Unicode characters. TSC nears handling one-eighth of those. I would not suggest attempting to handle all 2100+ of the HTML named entities. I would wish a review of named entity support could add names beyond the bare-bones HTML 4.01 list. (Surely somewhere there is data on frequency of entity name usage?) But in any case, documenting that TSC is currently limited to those entity names published 21 years ago would be a good thing (even if potentially embarrassing). |
TypeScript doesn't own the JSX specification. They'll follow whatever Facebook defines. TypeScript doesn't support a limited set of HTML entities, they support JSX. There's some traction happening regarding this subject: facebook/jsx#132 (comment) |
I find zero embarrassment in exactly matching the behavior of the de facto reference implementation of the JSX transform 😃 The above linked issue (or its parent repo) would be the place to track change in this behavior. As-is, I don't want people randomly seeing |
This issue has been marked 'Working as Intended' and has seen no recent activity. It has been automatically closed for house-keeping purposes. |
This is now the specified behavior facebook/jsx#136 |
Bug Report
I've encountered a known HTML named entity that is not recognized by TSC when present within a React JSX file (aka .tsx), which is instead then spat out unchanged into the HTML page in its original entity text form.
Specifically, using
 
within JSX code does not work, with " " being displayed in the browser window. 
does work when placed directly withinindex.html
, and the numeric form 
works everywhere.Looking at the intersection between Unicode "General Punctuation" (2000–206F) sections "Spaces" (2000-200A) and "Format Characters" (200B-200F) chart@Unicode...
and the list of known HTML entity names table@whatwg...
and looking at the definition of HTML entity names known to TSC TypeScript/src/compiler/transformers/jsx.ts...
TSC has 7 of these named HTML entities defined:
I have used 6 of these in projects.
The other known HTML entity names for 'spaces' are:
I have used the last 4 in projects as well, the last only for experiments.
I believe it is true that the list of entities in
jsx.ts
has not changed since that file was created rbuckton committed on Feb 16, 2016 (ca. line 232)And above I have identified named entities missing from only two very small sections within Unicode.
It is certainly true that the workaround for entity names missing from TSC is to use the numeric entity references, such as
 
, but then matters of 'usability', 'cryptic', 'confusing', etc. arise.I am wondering what policy you would use in deciding whether to include or not include additional entity names. I am hoping you can be somewhat less severe than WHATWG's "additions are bad" stance.
In any case, documentation somewhere that TSC does not handle 'every' HTML named entity would be useful. Such as "CounterClockwiseContourIntegral" or "leftrightsquigarrow" or "angrtvbd"...
🔎 Search Terms
entity numsp thinsp HTML
🕗 Version & Regression Information
TSC 4.5.2
Inspecting source on Github shows this code section in jsx.ts has not changed in (4?) years.
🙁 Actual behavior
HTML entity   when present in a JSX file is echoed unchanged to web page and displayed there as  
🙂 Expected behavior
use of   should have exactly the same result as   , instead JSX (.tsx) source containing for example
appears in browser window as
which is not surprising given the generated JS code has:
The text was updated successfully, but these errors were encountered: