Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add indigenous languages not in CLDR #10684

Merged
merged 1 commit into from
Jan 22, 2025
Merged

add indigenous languages not in CLDR #10684

merged 1 commit into from
Jan 22, 2025

Conversation

k-yle
Copy link
Collaborator

@k-yle k-yle commented Jan 16, 2025

I didn't realise this was possible until I saw the CLDR override list today.

This PR adds 30 entries from the australian wiki page. cc @andrewharvey, I see you edited this page recently

data/languages.json Outdated Show resolved Hide resolved
@@ -37,6 +39,7 @@
"asa": {"nativeName": "Kipare"},
"ast": {"nativeName": "asturianu"},
"atj": {},
"aus": {"nativeName": "Generic Australian Aboriginal"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t have many intentional instances of language families in this list, but I suspect they all either omit the nativeName property or use a name that’s a compromise among the various languages within the family.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible to translate it without the nativeName listed here? I don't know if there would be a compromise native name for aus from all the languages part of aus.

As for the best English, I would think "Australian languages" or "Australian Aboriginal languages".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t have many intentional instances of language families in this list

Do we have any? From what I can see, we at least have no ISO 639-2 codes that have the scope "Collective".


Maybe it's best to omit codes for language families? I don't know anything about aus languages, but if there were a name:gem tagged somewhere, it would find it not very useful, as it could be anything from Icelandic, English, Dutch, Danish, Norwegian, Swedish, German or any other Germanic language. Btw, I actually found a couple of those in Romania (example), where the tag was misused to apparently tag a local German dialect (it should have better been something like [old_]name:bar), but I digress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible to translate it without the nativeName listed here?

To clarify: Currently none of the language names are translated by "us". All language name translations come from the CLDR project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name:aus currently has the most use of any of the other languages listed at https://wiki.openstreetmap.org/wiki/Australian_Tagging_Guidelines/Australia's_First_Peoples#Mapping_Indigenous_Names

It's used where the specific language is not known and so allows us to still tag it, until the specific language can be determined, and we should have some label for it beyond just aus which is misleading as a label.

Copy link
Collaborator

@1ec5 1ec5 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any? From what I can see, we at least have no ISO 639-2 codes that have the scope "Collective".

Through CLDR, we have language codes that ISO 639-3 considers to be language families, such as Arabic, Chinese, and Kurdish. However, a single native name is pretty well established for those codes.

It's used where the specific language is not known and so allows us to still tag it, until the specific language can be determined, and we should have some label for it beyond just aus which is misleading as a label.

I agree that indigenous language families make more sense to list than language families that contain more mainstream languages. Often, sources disagree with SIL about the classification of specific languages beneath these families or consider them to be mere dialects. Many lack a strong written tradition, so a toponym’s spelling can vary as much by personal choice of orthography as by dialectology.

If we know of a toponym in a more specific language, then it can be tagged in that language, but it doesn’t surprise me that name:aus occurs more frequently than any of the more specific codes. The only issue is what to label it as. If there isn’t a better option, then the English is fine I guess, just a bit ironic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arabic, Chinese, and Kurdish

FWIW, these are considered macrolanguages in the standard, which is something slightly different from a language collective like aus.

Btw, the ISO standard lists aus simply as Australian languages in the code tables.


It's used where the specific language is not known

Ok fine ☺️ , but perhaps the wiki needs to be updated then… I was understanding the section more in the sense that aus should only be used for the super rare cases where there actually is no language code for a specific indigenous language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the wiki.

In practice in OSM it's being used for both where it's not known at all or the research is unclear or inconclusive AND where the mapped doesn't know or want to specific which language it is.

@andrewharvey
Copy link
Contributor

Fabulous! I did just add an entry on the wki as I was mapping https://www.openstreetmap.org/way/395478653. Would be great to have these appear as their actual language names rather than the ISO code in the iD presets.

I don't know enough about this specific file and how the nativeName field works to comment.

Copy link
Member

@tyrasd tyrasd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like you can also add these languages to the list of territoryLanguages in scripts/build_data.js (see line 186): then they are included towards the top of the languages dropdown menu in the respective country.

@@ -37,6 +39,7 @@
"asa": {"nativeName": "Kipare"},
"ast": {"nativeName": "asturianu"},
"atj": {},
"aus": {"nativeName": "Generic Australian Aboriginal"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t have many intentional instances of language families in this list

Do we have any? From what I can see, we at least have no ISO 639-2 codes that have the scope "Collective".


Maybe it's best to omit codes for language families? I don't know anything about aus languages, but if there were a name:gem tagged somewhere, it would find it not very useful, as it could be anything from Icelandic, English, Dutch, Danish, Norwegian, Swedish, German or any other Germanic language. Btw, I actually found a couple of those in Romania (example), where the tag was misused to apparently tag a local German dialect (it should have better been something like [old_]name:bar), but I digress.

data/languages.json Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@tyrasd tyrasd added the localization Adapting iD across languages, regions, and cultures label Jan 16, 2025
@k-yle k-yle force-pushed the kh/cldr-missing branch 2 times, most recently from 2bf786d to eec31c5 Compare January 20, 2025 09:36
@tyrasd tyrasd merged commit e123ec9 into develop Jan 22, 2025
4 checks passed
@tyrasd
Copy link
Member

tyrasd commented Jan 22, 2025

@k-yle regarding 1e980be: I would like to include this. The proposed script seems to be already quite fine. Would you like to create a PR for that (github doesn't let me do it because your commit doesn't belong to any branch)?

I would only propose to change the logic on line 303 such that it does not overwrite any names already in cldr (which can happen if a future release of cldr does include new languages):

-    if (value.names?.[language]) {
+    if (value.names?.[language] && !translatedLangsByCode[key]) {

@k-yle k-yle deleted the kh/cldr-missing branch January 22, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
localization Adapting iD across languages, regions, and cultures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants