-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add indigenous languages not in CLDR #10684
Conversation
data/languages.json
Outdated
@@ -37,6 +39,7 @@ | |||
"asa": {"nativeName": "Kipare"}, | |||
"ast": {"nativeName": "asturianu"}, | |||
"atj": {}, | |||
"aus": {"nativeName": "Generic Australian Aboriginal"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t have many intentional instances of language families in this list, but I suspect they all either omit the nativeName
property or use a name that’s a compromise among the various languages within the family.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still possible to translate it without the nativeName listed here? I don't know if there would be a compromise native name for aus
from all the languages part of aus
.
As for the best English, I would think "Australian languages" or "Australian Aboriginal languages".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t have many intentional instances of language families in this list
Do we have any? From what I can see, we at least have no ISO 639-2 codes that have the scope "Collective".
Maybe it's best to omit codes for language families? I don't know anything about aus
languages, but if there were a name:gem
tagged somewhere, it would find it not very useful, as it could be anything from Icelandic, English, Dutch, Danish, Norwegian, Swedish, German or any other Germanic language. Btw, I actually found a couple of those in Romania (example), where the tag was misused to apparently tag a local German dialect (it should have better been something like [old_]name:bar
), but I digress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still possible to translate it without the nativeName listed here?
To clarify: Currently none of the language names are translated by "us". All language name translations come from the CLDR project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name:aus
currently has the most use of any of the other languages listed at https://wiki.openstreetmap.org/wiki/Australian_Tagging_Guidelines/Australia's_First_Peoples#Mapping_Indigenous_Names
It's used where the specific language is not known and so allows us to still tag it, until the specific language can be determined, and we should have some label for it beyond just aus
which is misleading as a label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any? From what I can see, we at least have no ISO 639-2 codes that have the scope "Collective".
Through CLDR, we have language codes that ISO 639-3 considers to be language families, such as Arabic, Chinese, and Kurdish. However, a single native name is pretty well established for those codes.
It's used where the specific language is not known and so allows us to still tag it, until the specific language can be determined, and we should have some label for it beyond just aus which is misleading as a label.
I agree that indigenous language families make more sense to list than language families that contain more mainstream languages. Often, sources disagree with SIL about the classification of specific languages beneath these families or consider them to be mere dialects. Many lack a strong written tradition, so a toponym’s spelling can vary as much by personal choice of orthography as by dialectology.
If we know of a toponym in a more specific language, then it can be tagged in that language, but it doesn’t surprise me that name:aus
occurs more frequently than any of the more specific codes. The only issue is what to label it as. If there isn’t a better option, then the English is fine I guess, just a bit ironic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arabic, Chinese, and Kurdish
FWIW, these are considered macrolanguages in the standard, which is something slightly different from a language collective like aus
.
Btw, the ISO standard lists aus
simply as Australian languages in the code tables.
It's used where the specific language is not known
Ok fine aus
should only be used for the super rare cases where there actually is no language code for a specific indigenous language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the wiki.
In practice in OSM it's being used for both where it's not known at all or the research is unclear or inconclusive AND where the mapped doesn't know or want to specific which language it is.
Fabulous! I did just add an entry on the wki as I was mapping https://www.openstreetmap.org/way/395478653. Would be great to have these appear as their actual language names rather than the ISO code in the iD presets. I don't know enough about this specific file and how the nativeName field works to comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like you can also add these languages to the list of territoryLanguages
in scripts/build_data.js
(see line 186): then they are included towards the top of the languages dropdown menu in the respective country.
data/languages.json
Outdated
@@ -37,6 +39,7 @@ | |||
"asa": {"nativeName": "Kipare"}, | |||
"ast": {"nativeName": "asturianu"}, | |||
"atj": {}, | |||
"aus": {"nativeName": "Generic Australian Aboriginal"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t have many intentional instances of language families in this list
Do we have any? From what I can see, we at least have no ISO 639-2 codes that have the scope "Collective".
Maybe it's best to omit codes for language families? I don't know anything about aus
languages, but if there were a name:gem
tagged somewhere, it would find it not very useful, as it could be anything from Icelandic, English, Dutch, Danish, Norwegian, Swedish, German or any other Germanic language. Btw, I actually found a couple of those in Romania (example), where the tag was misused to apparently tag a local German dialect (it should have better been something like [old_]name:bar
), but I digress.
2bf786d
to
eec31c5
Compare
eec31c5
to
37186e9
Compare
@k-yle regarding 1e980be: I would like to include this. The proposed script seems to be already quite fine. Would you like to create a PR for that (github doesn't let me do it because your commit doesn't belong to any branch)? I would only propose to change the logic on line 303 such that it does not overwrite any names already in cldr (which can happen if a future release of cldr does include new languages): - if (value.names?.[language]) {
+ if (value.names?.[language] && !translatedLangsByCode[key]) { |
I didn't realise this was possible until I saw the CLDR override list today.
This PR adds 30 entries from the australian wiki page. cc @andrewharvey, I see you edited this page recently