Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Unicode UCDXML data source #3

Open
behnam opened this issue Apr 24, 2019 · 0 comments
Open

Add Unicode UCDXML data source #3

behnam opened this issue Apr 24, 2019 · 0 comments
Labels
help wanted Extra attention is needed Unicode

Comments

@behnam
Copy link
Member

behnam commented Apr 24, 2019

Source

License

UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
https://www.unicode.org/license.html

Open Questions

  1. Should we set up only one repo with the all (complete UCD) set, or set up addition one or two for nounihan and/or unihan ones?
  2. Do we need to include both grouped and flat files, or one is enough in the repo? If both, maybe they belong to two separate repos?

Other Notes

From https://www.unicode.org/Public/12.0.0/ucdxml/ucdxml.readme.txt:

While every effort has been made to ensure consistency of the 
XML representation with the UCD files, there may be some errors;
the UCD files are authoritative.


There are six files, available in zip/jar format; the size is that of
the archive:

                    flat         grouped

no Unihan data       897 KB          556 KB
Unihan data only   5,855 KB        5,862 KB
complete UCD       7,657 KB        6,420 KB

The flat versions do not use the group mechanism. The grouped versions
use the group mechanism, with groups corresponding approximately to
the blocks (a few blocks have been subdivided).

The "no Unihan data" files do not contain the properties expressed only
in the Unihan database. The "Unihan data only" files contain only
the properties and code points expressed in the Unihan database.
The "complete  UCD" files reflect the complete UCD data.```
@behnam behnam added help wanted Extra attention is needed Unicode labels Apr 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed Unicode
Projects
None yet
Development

No branches or pull requests

1 participant