Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project basic internationalization support #673

Merged
merged 22 commits into from
Feb 16, 2021
Merged

Project basic internationalization support #673

merged 22 commits into from
Feb 16, 2021

Conversation

bkmeneguello
Copy link
Contributor

@bkmeneguello bkmeneguello commented Feb 14, 2021

This PR adds basic i18n support with localization (l10n) to portuguese (pt_BR).
The localization process is based on QTranslator and self.tr and QCoreApplication.translate methods.
To detect the translation the translated file must be added to novelWriter.pro under SOURCES, and the command python setup.py qtlupdate must be invoked to update the .ts files.
To add support for more languages, the translation file names must be added to novelWriter.pro under TRANSLATIONS before the python setup.py qtlupdate command to be invoked. This creates the apropriate files in the languages directory.
To translate the language files the command linguist nw/languages/nw_<lang>.ts must be used.
After the translation is done, the compilation of the .ts files is made by python setup.py qtlrelease command, which generate the compiled .qm files for all TRANSLATIONS.

Fixes #93

@bkmeneguello bkmeneguello changed the base branch from main to dev February 14, 2021 20:13
@vkbo
Copy link
Owner

vkbo commented Feb 14, 2021

Thanks! That's quite a lot of work. Much appreciated.

There's a few consistent deviations from the coding style in this, so instead of going through and reviewing them I may just push a commit to change them instead of tagging them all. Mainly, they are:

  • The project uses camlCase, not snake_case. Same as the Qt library.
  • You are using the alternate convention for multi-line statements, not the one used consistently in the code. Every wrapped statement should end with the same level of indentation as it started. This saves quite a lot of horizontal space.
  • The order of imports don't follow the convention. Full imports should come before partial imports, which shoudl come before internal imports.

There are also two constants that have been translated that shouldn't. The "internal" and "enchant" should not be translated. That will break if someone changes language after the program has first been run. "enchant" is the name of a package, and should not be translated at all, and "internal" should only be translated when it's added to the GUI. I can rewrite the code to do a lookup for the GUI where the translation can be injected.

The root folder for language files should either be at the root of the project or in the assets folder. I'll have a look at how the files are processed to see what's best. Essentially, the files that must be shipped with the package should be in assets, and the source files for translations should be in a folder on root level. Ideally. I'm not sure what the Qt linguist tool expects.

@vkbo
Copy link
Owner

vkbo commented Feb 14, 2021

We also need to figure out how to make the tests run with the language framework. I see you only initialise it in the default run mode, which isn't the one used for most tests.

In any case, I think I will move the translation initialisation into the Config class, if possible, so that it can be overridden by user settings. This will also fix the issue here from where the path to the language files is extracted. The __file__ variable works when running from the source, but in the packaged versions of novelWriter, that will not work and the language path will not be found. Files inside the package have no direct path in those cases, so they have to be moved up one level.

@vkbo
Copy link
Owner

vkbo commented Feb 14, 2021

The test framework seems fine actually. The tests fail because of bugs introduced here.

@bkmeneguello
Copy link
Contributor Author

Thank you for prefer to push the changes instead of reviewing them all.
About the "internal" and "enchant" I tried to only translate them when the text is presented to GUI but maybe failed somewhere. The QT_TRANSLATE_NOOP is just a maker to allow pylupdate5 to detect those strings, the actual translation must be done a posteriori using QCoreApplication.translate. I think there could be better separation between internal and external constants, to avoid them being used (but I totally understand why to use them when the application is not translatable).
Sorry about the tests, I should be more careful with them.
I'm still checking more changes to be pushed, but this PR was taking so long to finish I had to rebase two times with lots of conflicts, so I preferred to push a less reviewed code so we could collaborate better to polish this feature.

@vkbo
Copy link
Owner

vkbo commented Feb 14, 2021

I've branched off the code for the 1.2 release path, which now lives in the testing branch. I am happy to leave the dev branch alone until we've finished with this.

Thanks for explaining the QT_TRANSLATE_NOOP. I wasn't quite getting what it did.

I'm currently working on a commit where I'm moving the initialisation of the translation objects into the CONFIG object. I've almost got it working. When it's moved there, the user config file determines the language for the GUI, which uses the system language as the default. When I get this working, I'll add an option to select GUI language to the Preferences dialog. So let me know if you make any changes to the content of the __init__.py file, because I'm moving all of the code out of there.

@bkmeneguello
Copy link
Contributor Author

Nice, you may continue your refactor, don't mind. I'm just looking for some composite translations to improve, like "Click the 'Save' button", which should be "Click the {0} button".format(self.locale().quoteString(self.tr("Save"))).
Also, a nice addition would be to allow cross-reference button labels so this could be interpolated easily so it could be "Click the {0} button".format(self.locale().quoteString(XXX.SaveLabel)) or something similar.

@vkbo
Copy link
Owner

vkbo commented Feb 14, 2021

Sounds good. I just finished the moving of the initialisation from __init__.py to config.py. I also moved the language folder up one level and renamed it to "i18n". This way it will work for both packaged and non-packaged distributions of novelWriter.

Changing the GUI language can now be achieved by editing the "guilang" option in the main configuration file, which also allows me to test the Portuguese version of the GUI. I'll probably start adding a Norwegian translation too soon to get a feel for that side of it. I'll do that in a later PR though.

Anyway, it's getting quite late here in Europe, so I'll pick it up again tomorrow after work.

@vkbo
Copy link
Owner

vkbo commented Feb 14, 2021

By the way, I noticed that the QLocale class actually supports both the ISO 639 and 3166 codes, so I'll look into replacing my iso.py file with the QLocale class instead. Since Qt provides its own translation files, that should mean we don't need to manually translate those long lists of countries and languages. That looks like quite a job.

Edit: https://doc.qt.io/qt-5/qlocale.html#QLocale-1

@vkbo vkbo added this to the Release 1.3 Beta 1 milestone Feb 14, 2021
@vkbo vkbo linked an issue Feb 14, 2021 that may be closed by this pull request
@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

I was reading up on the Qt documentation last night and noticed the QLocale and QTranslator objects already have all the logic for parsing localisation strings and figuring out which files to load. I replaced the regex-based and customised loading facility with Qt's own.

I also added a global CONFIG.qLocale object that can be used around the code for various localisation purposes.

@bkmeneguello
Copy link
Contributor Author

Yeah, about the QTranslator.load(QLocale, ... I tried to use that but after dig the source code to understand how it works I chose to implement the regex to have some missing features.
That loader have a simplistic heuristic that loads only the first match from the complete locale string, like ln_Scrp_CT > ln_CT > ln (ln = lang, Scrp = script, CT = country). But the translators can be installed layered, so the first one installed is the last one searched.
My heuristic is to install the "qt" and "qtbase" below "nw" and, for every case, have the order ln_Scrp_CT > ln_CT > ln_Scrp > ln. This way one could add a "pt" translation which would be the base for "pt_BR" and "pt_PT", and these more specific translations could add only their specific terms, instead of have to duplicate (and maintain) the entire translation.

@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

I got that, but this is what the Qt tool does as well: https://doc.qt.io/qt-5/qlocale.html#uiLanguages, but I see that I may need to do it in a loop. Let me test with English which is layered like that.

@bkmeneguello
Copy link
Contributor Author

Yes, this uiLanguages could be preferable to my regex in a reversed loop.

@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

That's what I was thinking, It supports all the standards and is maintained, which means I don't have to.

Also, it should not be reversed as the translators are used in the reverse order they are added. The one for pt is:

pt
pt-BR
pt-Latn-BR

So it will pick the last one first, and work its way up.

Anyway, I'll convert this back to a loop.

@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

Great. I'll finish this after work. The previous stuff is something I looked at last night.

About the ISO lookup tables. I also looked into what the QLocale class can do, and it has the ability to print the native language name for any given language code it supports. I propose to use this as the information on the status bar and in Preferences, instead of a localised translation of other languages. A lot of apps do this, and it make sense as a user who have multiple languages installed will presumably understand the native name for these languages.

At my end, the status bar will say "British English (en-GB)" and "Norsk Bokmål (nb-NO)" for the languages I use, and similar for Portuguese etc. Showing the code will also aid in describing them when you don't understand the native name, which may be the case for the dropdown list of available GUI languages in Preferences.

The bonus is of course that I can drop the whole iso.py file and people don't have to provide translations for it.

@bkmeneguello
Copy link
Contributor Author

I think is the best solution. The translation of iso.py was not that difficult because Wikipedia provides the translation table for ISO names, just have to copy them to "linguist" is the worst.
I'm checking more points to improve the i18n, by adjusting the date format, number format, lists, zeroes and so. But this needn't be addressed by this PR, can be added later.
I think an important adjustment is the cross-linking of terms, like button/menu names with messages, to ensure their translations are kept in sync. The first problem is the context of these names, since self.tr uses the object class name as context, referencing these names from outside could require making use of QCoreApplication.translate(context, ... which will not be practical. Another possibility is to initialize the translation strings into public attributes and use them instead, but this would require direct references to those instances.
I don't know you this could be properly solved, maybe this requires more thinking.

@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

The date format should be handled by Python as all dates/times that go onto the GUI are formatted with %X and %x, which should be localised to the system formats. Integers are formatted with f"{:n}" which is also localised by Python.

Any date/time that go onto filenames and into files are ISO 8601 formats and should not be localised.

With the changes I'm making now, I'm also making sure the QApplication defaults to the correct locale, so I noticed now that the QDoubleSpinBox widgets have the correct decimal separator for the selected language. This appears to be handled when I added the QLocale.setDefault() line in Config. There shouldn't be much left to localise after that. I think that also takes care of list sorting.

@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

I think an important adjustment is the cross-linking of terms, like button/menu names with messages, to ensure their translations are kept in sync. The first problem is the context of these names, since self.tr uses the object class name as context, referencing these names from outside could require making use of QCoreApplication.translate(context, ... which will not be practical. Another possibility is to initialize the translation strings into public attributes and use them instead, but this would require direct references to those instances.
I don't know you this could be properly solved, maybe this requires more thinking.

In addition to this, there is an issue with the translation macro from C++ that doesn't transfer well to Python (The QOBJECT macro used for C++ classes). There are issues with namespaces and class names when using subclasses on the Python side. I read an article about this a while back. I think the best solution is to do what you've already started doing and overload the self.tr, but just explicitly state the context name as a string rather than extract the internals. This also means we can make a global context under for instance CONFIG, or adjacent to it, that holds the global translator. Can even make a global TR object.

Also, you can use qApp.translate() instead of QCoreApplication.translate() I think. It's much shorter.

Edit: Also, this: https://stackoverflow.com/questions/28385477/pyside-qt-tr-does-not-translate-translate-does-context-wrong

@bkmeneguello
Copy link
Contributor Author

A possible solution, based on your last link is to have something like

from functools import partial
from QtCore import QCoreApplication
tr = partial(QCoreApplication.translate, "context")

WDYT?

nw/gui/build.py Outdated
Comment on lines 334 to 342
self.novelFiles.setToolTip(self.tr("Include files with layouts {0}.").format(
self.locale().createSeparatedList([
self.locale().quoteString(self.tr('Book')),
self.locale().quoteString(self.tr('Page')),
self.locale().quoteString(self.tr('Partition')),
self.locale().quoteString(self.tr('Chapter')),
self.locale().quoteString(self.tr('Unnumbered')),
self.locale().quoteString(self.tr('Scene')),
])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkbo This is how I suggest working with lists and quoted terms.

Copy link
Owner

@vkbo vkbo Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too much to be honest. Translation already creates overhead, and this is just a tooltip.

Edit: The tooltip text is already localised, so if there is need for different symbols, then that should be done in the ts file, not like this.

@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

As for the translation of buttons, I see that the qt_pt.qm file in the qttranslations repo is a lot larger than the one shipped with my OS. Maybe it is more complete.

Update

Aha, it turns out the qt_xx.qm files are for Qt 4.x. The correct file for Qt 5 is qtbase_xx.qm. That doesn't exist for Portuguese at all (or Norwegian). No one has taken the time to make that file. No wonder the translations are incomplete.

Anyway, this is the main repo. It should be possible to submit translations.

https://code.qt.io/cgit/qt/qttranslations.git/

@vkbo
Copy link
Owner

vkbo commented Feb 15, 2021

I saw your ToDo tag in the numberToWord function, so I extended it to support multiple languages. It was hard coded to English. Now it should use the spell check language for the project, and if there is no function to do the conversion, it should fall back to just use plain numbers.

I also added the two forms of Norwegian as a test case, and it now picks up the project language setting and produces the correct number words.

@bkmeneguello
Copy link
Contributor Author

Translation updated. I'm done too.

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

Great! I think this is in a good state now.

There is just one thing that I thought about just now, and that's the translation of the words "Note", "Comment" and "Synopsis" as used in the Tokenizer class. These should follow the project language setting, not the GUI language. The same labels when they appear on the GUI should follow the GUI language, but for the Tokenizer class they apply only to the generated text and export files.

I don't want to add a whole new QTranslator for just three words, so I'm thinking to add them to the already existing set of language dependent functions alongside the numberToWord functions.

In any case, I have another branch dealing with those functions, so I'll add that there instead. This PR seems complete now and I will merge it after a last full read through of the diff.

I still haven't decided on how to deal with the dialog button texts. I'm inclined to stick with your override approach as there are many languages that don't have qtbase_xx.qm files, including my own. That means I need to customise the message boxes too though, and manually account for differences between macOS button texts and Windows/Linux.

i18n/nw_pt.ts Outdated Show resolved Hide resolved
@bkmeneguello
Copy link
Contributor Author

Hmm, shouldn't this just be a blank translation?

indeed

@bkmeneguello
Copy link
Contributor Author

bkmeneguello commented Feb 16, 2021

I don't want to add a whole new QTranslator for just three words, so I'm thinking to add them to the already existing set of language dependent functions alongside the numberToWord functions.

About the numberToWord, what do you think about num2words?

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

I'll make one last push to this. I found a few more errors and missing translations and {variable} type entries where they should be {0} for consistency.

I also want to remove most of the nested translations. I understand the point of having the order of items in a string in a different translation entry than the content, but in the majority of cases the order can be handled within a single entry.

Python is significantly slower than C/C++, and the only way to keep a Python GUI application snappy and responsive is to save clock cycles at every opportunity. I use a few other Python GUI applications, and as they grow, they get increasingly sluggish. novelWriter is still very responsive in most cases (with the exception of handling >1 MB text files) and I want to keep it this way.

The benefit of {0}: {1) versus Text: {0} is minimal from a translation point of view. One just needs to be a bit more attentive when translating to make sure all the {} entries are included.

I'll push the remaining changes after work.

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

About the numberToWord, what do you think about num2words?

I've looked at it before, and decided against adding a dependency for such a simple feature. I am avoiding dependencies unless they are absolutely necessary. I spent nearly a week writing the ODT exporter to avoid adding odfpy as a dependency.

Edit: I'm going to add the German numberToWord function. I'll also do the French I think. My French is decent enough, but it's also an interesting case because it needs to differentiate between French, Swiss French and Canadian French. I'll make that a different PR though.

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

Every time I look through the code, I find something. I found more missing translates, and I realise one of my earlier changes broke all the translations in the Constants context. Sorry! After two full passes through the code, I think I've found the majority of issues. After you've updated the translations, I think I will merge.

I've updated the .ts file, but there is a need for another pass at translations. Maybe you can copy/paste back the constants from earlier, and there are a few new entries here and there that need to be translated for the first time.

There are also some translations that are very long. I realise different languages need different length to achieve the same meaning, but some of the texts are so verbose that they mess up the GUI in several places. See for instance tab labels in Project Settings. I think it would improve the GUI if there are shorter terms that carry a close enough meaning to replace the English terms. I've picked English words often for their shortness on for instance tabs.

@bkmeneguello bkmeneguello changed the title Projeto basic internationalization support Project basic internationalization support Feb 16, 2021
@bkmeneguello
Copy link
Contributor Author

Translation updated

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

Thanks! I'm going to merge now before anything else shows up.

I'd appreciate if you took a look at how the Portuguese text fits on the various GUI elements at some point.

@vkbo vkbo merged commit b935f8a into vkbo:dev Feb 16, 2021
@bkmeneguello
Copy link
Contributor Author

About the numberToWord, what do you think about num2words?

I've looked at it before, and decided against adding a dependency for such a simple feature. I am avoiding dependencies unless they are absolutely necessary. I spent nearly a week writing the ODT exporter to avoid adding odfpy as a dependency.

Edit: I'm going to add the German numberToWord function. I'll also do the French I think. My French is decent enough, but it's also an interesting case because it needs to differentiate between French, Swiss French and Canadian French. I'll make that a different PR though.

Do you think it worth to implement the number translation by hand for all languages? Even for a specialized project like "num2words" it's a complex task and, quoting your words, "It supports all the standards translations and is maintained, which means I you don't have to"

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

I only need a tiny fraction of the functionality num2words has. The conversion of small integers to words is also a fairly simple piece of code. I really don't want to add external dependencies for minor features. Python dependencies are a nightmare to maintain as packages die all the time and are a big enough security risk that I don't want to more than the absolutely necessary. The three dependencies that I have now are so common that they are in practically all distros.

In any case, since I need translations for the words "Synopsis", "Comment" and "Notes" for the Tokenizer class, I'm opting for a simple lookup dictionary with those words and the first hundred numbers in a JSON file. The people that add translations can also provide those I hope. They take a few minutes to make. I just made three of them.

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

On another note. I'm tempted to take the .qm files out of the repository. I don't like tracking binary files, especially since they are not small (relatively speaking). It's better to add them to the build pipeline and put them into the distributed packages.

What's the common practice on that? Do you know?

@bkmeneguello
Copy link
Contributor Author

bkmeneguello commented Feb 16, 2021 via email

@vkbo
Copy link
Owner

vkbo commented Feb 16, 2021

I think both are OK, the advantage of keeping the compiled files within the project is if someone tries a pip install git+ssh://... it will require the Qt5 installed in the system.

That's fine. Most people who do a pip install will do it from pypi, where I will include the .qm files.

@bkmeneguello bkmeneguello deleted the i18n branch October 26, 2021 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translations of novelWriter
2 participants