Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JA character set #588

Merged
merged 2 commits into from
Jan 17, 2025

Conversation

palemieux
Copy link
Contributor

Closes #544

@himorin
Copy link
Contributor

himorin commented Jan 1, 2025

I'm not sure I am reading correctly, but per ARIB-B52, addition to base character set are:

  • Coded Kanji set in JIS X0213:2004
  • Coded character in JIS X0213:2004 Annex 5 table 1 and 2
  • Additional in ARIB-B52 Table 5.2 and 5.3, which is out of JIS X0213:2004 (like not in BMP of Unicode)

I believe the last one "Gaiji characters" should be noted as "ARIB Gaiji characters" or something with adding "table 5.2 and 5.3", since these sets (table 5.2 and 5.3) are listing additional characters beyond JIS X0213:2004 with mapping to UCS. Also I think phrase "Gaiji characters" used at section 5.5 is not this one.
For others I think all are ok, that I suppose "Kanji set" (whole JIS X 0213:2004) is collections 285 (JIS X0208:1990 = level 1/2 of X0213), 371 (JIS X0213:2004 level 3/4), and 286 (non ideographics set); and Annex 5 table 1/2 are listed four sets of 0xFFXX characters.

@palemieux
Copy link
Contributor Author

I'm not sure I am reading correctly, but per ARIB-B52, addition to base character set are:

@himorin You mean ARIB B62, right?

The liaison at #544 explicitly specifies "Additional ideographs and symbols defined in Table 5-2 in Vol.1, Part 2 of ARIB STD-B62". It does not mention Table 5-3, and Table 5-2 is in the subclause titled "5.5 Encoding of Gaiji Characters".

image

Copy link
Contributor

@nigelmegitt nigelmegitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @himorin is much better placed to approve this than me! It appears to do what the issue requested, though I haven't checked if in the long intervening period anything has changed, e.g. the addition of new Unicode code points to include.

@himorin
Copy link
Contributor

himorin commented Jan 8, 2025

so-called ARIB Gaiji which are defined in tables 5.2 and 5.3 was used to display glyphs outside of JIS X 0213 (not Unicode), built from glyphs historically used at analog based TV program (via hand writing open caption and so on). And most all of ARIB STD-B62 definitions are based on JIS X 0213 (so-called JIS character set), per historical reason.
So, even character code used in ARIB STD-B62 is Unicode (ISO 10646:2017) as defined in section 5.1, repertoire listed in section 5.2 is based on JIS X0213, which is referred as correction 285, 286 and 371 in this PR. In addition to them, by the last bullet in list of section 5.2, Additional symbols and characters shown in Table 5-2 and Table 5-3, so-called ARIB Gaiji are added to the repertoire based on Unicode mapping - all of so-called ARIB Gaiji are included into Unicode at some point like 5 or something.

My understanding of section 5.5 Encoding of Gaiji Characters is to use some glyphs not within the repertoire, by using external font data using SVG or WOFF. Also, text specifically mentions about section 5.2, as used for encoding of Gaiji characters (characters not included in repertoire defined in 5.2). which includes all in Table 5.2 and 5.3 as above.

@palemieux palemieux merged commit 59e3da3 into issues/bump-imsc-1-3 Jan 17, 2025
1 check passed
@nigelmegitt nigelmegitt mentioned this pull request Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants