From 6f4bc158f60218b6c459a5dd0746735435ad8181 Mon Sep 17 00:00:00 2001 From: "Peter M. Stahl" Date: Tue, 29 Oct 2024 12:15:13 +0100 Subject: [PATCH] Add release notes --- README.md | 6 ++++-- RELEASE_NOTES.md | 26 ++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index d5c3f4d9..64da2ffc 100644 --- a/README.md +++ b/README.md @@ -62,14 +62,16 @@ Because of that, the language models were then stored in NumPy arrays instead of dictionaries. Memory consumption reduced to approximately 800 MB but CPU performance dropped significantly. Both approaches were not satisfying. -Starting from version 2.0.0, the pure Python implementation was replaced with +Starting from version 2.0.0, the pure Python implementation is complemented by compiled Python bindings to the native [Rust implementation](https://github.com/pemistahl/lingua-rs) of *Lingua*. This decision has led to both quick performance and a small memory footprint of less than 1 GB. The pure Python implementation is still available in a [separate branch](https://github.com/pemistahl/lingua-py/tree/pure-python-impl) in this repository and will be kept up-to-date in subsequent 1.* releases. -Both 1.* and 2.* versions will remain available on the Python package index (PyPI). +There are environments that do not support native Python extensions such as +[Juno](https://juno.sh/), so a pure Python implementation is still useful. +Both 1.* and 2.* versions are available on the Python package index (PyPI). ## 4. Which languages are supported? diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md index 40351156..c08f6e28 100644 --- a/RELEASE_NOTES.md +++ b/RELEASE_NOTES.md @@ -1,3 +1,29 @@ +## Lingua 1.4.0 (released on 29 Oct 2024) + +### Features + +- This release introduces an absolute confidence metric based on unique and most + common ngrams for each supported language. It allows to build + a language detector from a single language only. Such a detector serves as + a binary classifier, telling you whether some text is written in your selected + language or not. (#235) + +### Improvements + +- The new absolute confidence metric helps to improve accuracy in low accuracy mode. + The mean of average detection accuracy (single words, word pairs and sentences combined) + increases from 77% to 80%. + +### Bug Fixes + +- The tokenization of texts written in the Devanagari alphabet was flawed. + This has been fixed, leading to better detection accuracy for Hindi and Marathi. + +### Compatibility + +- The newest Python 3.13 is now officially supported. +- Support for Python 3.8 and 3.9 has been dropped. The lowest supported Python version is 3.10 now. + ## Lingua 1.3.5 (released on 03 Apr 2024) ### Improvements