Microsoft.Recognizers.Text provides robust recognition and resolution of entities like numbers, units, and date/time; expressed in multiple languages. Full support for Chinese, English, French, Spanish, and Portuguese. Partial support for German, Japanese, Korean, and Dutch. More on the way.
Microsoft.Recognizers.Text powers pre-built entities in both LUIS: Language Understanding Intelligent Service and Microsoft Bot Framework; and is also available as standalone packages (for the base classes and the different entity recognizers).
The Microsoft.Recognizers.Text packages currently target four platforms:
- C#/.NET - NuGet packages available at: https://www.nuget.org/profiles/Recognizers.Text
- JavaScript/TypeScript - NPM packages available at: https://www.npmjs.com/~recognizers.text
- Python (in progress)
- Java (in progress)
Contributions are greatly welcome! Both for fixes and extensions in the currently supported languages and for expansion to new ones. Especially for Japanese, Italian, Korean, and Dutch. More info below.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Good starting points for contribution are:
- the list of open issues (especially those marked as
help wanted
); - the json spec cases temporarily marked as
NotSupported
(Specs); and - translating json test spec cases that work in English, but don't yet exist in a target language.
The links below decribe the project structure and provide both an overview and tips on how to contribute. Thank you!
- Overview and language resources
- Implementing language specific behaviour
- Test specs and testing in general
The table below summarizes the currently supported entities. Support for English is usually more complete than others. The primary platform is .NET (shown in table) and support should propagate to the others.
Entity Type | EN | ZH-CN | NL | FR | DE | IT | JA | KO | PT | ES |
---|---|---|---|---|---|---|---|---|---|---|
Number (cardinal) | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ✓ | ✓ | ✓ | ✓ |
Ordinal | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ✓ | SO | ✓ | ✓ |
Percentage | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ✓ | SO | ✓ | ✓ |
Unit - Age | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ✓ | SO | ✓ | ✓ |
Unit - Currency | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ✓ | ❌ | ✓ | ✓ |
Unit - Dimensions | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ❌ | ❌ | ✓ | ✓ |
Unit - Temperature | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ❌ | ❌ | ✓ | ✓ |
Choice - Boolean | ✓ | ✓ | ✓ | ✓ | ✓ | SO | ✓ | ❌ | ✓ | ✓ |
Seq. - E-mail | G | G* | G | G | G | G | G | G | G | G |
Seq. - GUID | G | G | G | G | G | G | G | G | G | G |
Seq. - Social | G | G | G | G | G | G | G | G | G | G |
Seq. - IP Address | G | G | G | G | G | G | G | G | G | G |
Seq. - Phone Number | G | G | G | G | G | G | G | G | G | G |
Seq. - URL | G | G* | G | G | G | G | G | G | G | G |
DateTime (+subtypes) | ✓ | ✓ | SP | ✓ | partial (bugs) |
SO | SI min | SI min | ✓ | ✓ |
- G: Generic entity, not language-specific (* unicode TLDs not-supported);
- SO: Specs-only;
- SP: Partial specs;
- SI: Very initial specs.