Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greater BCP47 compatibility? #21

Open
despresc opened this issue Feb 15, 2021 · 7 comments
Open

Greater BCP47 compatibility? #21

despresc opened this issue Feb 15, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@despresc
Copy link

despresc commented Feb 15, 2021

Are there plans for greater BCP47 compatibility? The tags es-419, en-gb, en-TP, and cmn, for instance, are not recognized by the library. That last tag is even the preferred encoding of Mandarin Chinese (over the currently-recognized zh-guoyu, and others) according to the IANA registry.

The Region component in particular seems odd, since it will fail to recognize any of the three digit region codes (like 419) registered with the IANA, but it will recognize tags with three digit country codes that already have two letter codes, like in zh-012. (Such codes must not be used in tags, according to https://tools.ietf.org/html/rfc5646#section-2.2.4, item 4.D).

There is also a small issue that may be a problem in the future, though it is probably unlikely: the primary language tag is not defined to include all of ISO 639-1. The IANA will not register any future ISO 639-1 code that is already covered by a three letter code (see https://tools.ietf.org/html/rfc5646#page-11), so this library would have to switch away from the iso639 package in that event (assuming that package is updated to include the code) to remain correct.

@despresc
Copy link
Author

despresc commented Feb 16, 2021

Granted, this would represent a pretty significant rewrite of the package. If there is no appetite for that, I may write one that provides full BCP47 coverage.

@pbrisbin
Copy link
Member

pbrisbin commented Apr 9, 2021

Sorry for the delayed reply! For some reason I wasn't notified of this Issue. 🤔

@eborden, do you have any thoughts here?

@eborden
Copy link
Contributor

eborden commented Apr 9, 2021

@despresc I certainly see no problem with continuing to expand coverage of this library. Some decisions such as leveraging ISO639_1 from the iso639 package or Country from the country package were made for expediency and not full compliance. I'd actually love to see those packages improved to be in greater compliance, but I'd gladly welcome pull requests to this repository to continue expanding coverage.

@pbrisbin pbrisbin added the enhancement New feature or request label Apr 9, 2021
@despresc
Copy link
Author

Thanks for the reply! Actually, between opening this issue and your reply, I started work on my own package. I'm not sure how well it can be ported over, since it happens to represent BCP47 tags and subtags differently than how they are in this package, and the parsing and analysis flow is also a bit different. I think I may just continue working on it, since it's fairly well developed at this point. Sorry for the duplicated effort.

@cdparks
Copy link
Contributor

cdparks commented Jun 29, 2021

@pbrisbin @eborden this has actually bitten us now - our region parser is overly case-sensitive. We accept en-GB (correctly) but not en-gb (incorrectly)

@cdparks
Copy link
Contributor

cdparks commented Jun 30, 2021

@despresc if you have time, can you verify that I've characterized these issues correctly in this comment? See also these pending tests.

@despresc
Copy link
Author

despresc commented Jul 1, 2021

Yes, the es-419 and en-TP issues stem from country, which encodes a distinct but overlapping set of country codes from those in the IANA registry.

The cmn issue is due to iso639, which doesn't support the other standards in the ISO 639 series. I should mention that the primary langauge subtags are currently a strict subset of the ISO 639 codes, from what I recall. The registry doesn't contain every ISO 639-3 code, at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 👜 To do
Development

No branches or pull requests

4 participants