Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(i18n): experimental i18n Support (only client/common/inspector/interactive) #2553

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

philtweir
Copy link
Contributor

Trial to see what would be involved in adding rust-i18n to support translatability. This only translates the crates atuin-common and atuin-client, plus inspector.rs and interactive.rs in the atuin crate.

This PR touches a lot of files: maybe overkill, but on the basis that every natural language string is a user-facing string, by the nature of a CLI tool, I have added t! basically everywhere. If a minimalist approach is better, easy to wind that back, but at least it is clear what the scale of changes is at the top end.

I am pretty new to this, so keen for feedback from knowledgable folks. Rust internationalization does not seem to be in a greatly active state - most of the packages have long-standing issues/PRs without a lot of movement, and no one package provides a one-stop shop. sys-locale and rust-i18n are relatively well-used recently (2nd and 6th in lib.rs i18n package stats) and fairly uninvasive.

Using English & Irish and only the interactive search as a minimal test. To use:

LANG=C atuin search -i
LANG=ga atuin search -i

The main action points would be adding t!(...) everywhere appropriate, in some places accounting for non-static return values, and then progressively adding translations to app.yml files as people wish to PR. The cargo i18n command creates a TODO.yml, so I imagine it is easiest to leave strings there until the first time they are actually translated into some language, then move them to app.yml

Screenshot From 2024-12-29 20-53-27

I have included one Chinese string (which would need confirmed by a native speaker!) to see how this works in a non-Latin charset in atuin-common but other than that, the examples are (English and) Irish.

Main Negatives

Dynamic String Rendering

The biggest logic impact is that, because the strings are dynamically composed based on the active locale, the compiler cannot check format strings where the translation string is parametrized, and that must be composed at runtime.

It also means that conditionally-rendered format strings (like debug!) now have t!(...) as a parameter instead, so the bulk of the string will be built regardless (this seems not a huge issue in an atuin context, but could be an argument for leaving them out of i18n). However, anywhere the static text can be split out from any parameters, I have done so, unless it seems possible that a translator might need to reorder the phrase in some language. E.g. "Loaded at: {time}" -> whole string parameterized, "Error in data: {err}" -> split and only "Error in data" translated

Risk of Missing Parameters

It is possible, especially with rare error messages, that a missing % in the format string, etc. could mean that the string is output without the parameter being rendered - e.g. Error found: {err} instead of Error found: %{err} will not interpolate err and this will only become obvious when it is hit.

Risk of Mistranslations

Self-explanatory - could confuse users, which could cause data loss, etc. in the (hopefully theoretical) worst case.

Lots of Code Touched

Naturally, as there are a lot of user-facing strings (esp. error messages), this means small changes on lots of lines. Many are non-trivial, like swapping a format! into a t! that with a separate format! call for each parameter that expects a debug formatter. So it is not fully-automatic - as I tried to minimize complexity by looking at the context - and needs checked for typos.

The biggest mitigation there would be to start with the happy-path, non-debug strings only. I guess though that could be frustrating for a translator who cannot change many of the outputs, and would have to come back to translate more strings each time the percentage of translatable strings grows.

Notes

  • as t!(...) can take arguments (e.g. translate %{time} ago) it does not return &'static str so, for FilterMode, I used lazy_static and elsewhere, switched to String or otherwise worked-around.
  • for simplicitly, mostly I used the untranslated strings as the keys, but probably it would be better to invent slug-like keys
  • initialization of the locale happens in each crate, but other than pulling the macro in, only a single line is required to call atuin_common::utils::set_locale(), which consistently sets the locale for each crate
  • rust-i18n has two translation string file format versions - one splitting by language, and another all combined, so all languages are keyed under a given translation string -- I have taken the second, which is recommended for ease of translation (as the original and other translations are right beside). OTOH, having separate language files would avoid merge conflicts between languages
  • I didn't see much benefit having per-crate tests, especially as the logic is all in rust-i18n, but I did put a couple of sense-checking integration tests in atuin_common::utils where the set_locale and get_locale commands live
  • Project Fluent looks like a solid, standardized system, and is supported (to some extent) by Weblate but does not really look active given the size of project
  • the rust-i18n approach is really simple, but could be more standard to use gettext?

Checks

  • I am happy for maintainers to push small adjustments to this PR, to speed up the review cycle
  • I have checked that there are no existing pull requests for the same thing

@philtweir philtweir changed the title Experimental i18n Support (only client/common/inspector/interactive) feat(i18n): experimental i18n Support (only client/common/inspector/interactive) Jan 19, 2025
@philtweir
Copy link
Contributor Author

Leaving this as draft to give a chance for general feedback!

"%{time} ago":
en: "%{time} ago"
ga: "%{time} ó shin"
time.y:
Copy link
Contributor Author

@philtweir philtweir Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is maybe excessive, as SI units are supposed to have the same representation in every language, but it gives an idea how this could work (and non-SI units like days, hours, minutes, etc. do vary). if we can assume every language suffixes values with a (localized) unit and writes units in the same order, like 3h2m[en]==3u2n[ga], (notwithstanding general language direction), then this may not need to be a parameter

@atuin-bot
Copy link

This pull request has been mentioned on Atuin Community. There might be relevant details there:

https://forum.atuin.sh/t/internationalization/733/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants