-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chinese language support #39
Comments
That would be really cool. We would need a contributor who speaks chinese. |
+1. Finding API for adding custom analyzer. |
Do you know of any good libraries that can analyse Chinese text? |
These are very stable implementations: |
Any progress? Or anything I can help. |
The following things need to be done:
To make progress with this case we need a Chinese (Mandarin?) speaker to do 1 and 2. |
I am a native Chinese speaker and I will try what you suggest in this weekend. |
Great! Even if you could point to a test dataset in Chinese that already exists, that would be a great help. |
Hi,
|
Great! Your dataset is probably worthy of its own project and Github repo. The fastest way to get Chinese text indexed by Forage is to improve the the tf-idf functionality of https://github.com/NaturalNode/natural . If |
I skimmed the source code of They only use English stop words. The whole project is not designed for international language usage, there are so many hard-coded lines. see NaturalNode/natural#159 and NaturalNode/natural#177 The major concern is I'll try to hack into TF-IDF module of |
I think Have you ever done any benchmarks against |
Re Hacking tf-idf in Natural: That sounds like a good plan! Yes, Forage could be a competitor to Elasticsearch for some cases. I work a bit with Elasticsearch, but havent yet done any benchmarks- I should definitely do that. |
I recently rewrote a lot of this code, and some of my Chinese colleagues tell me that it is now working for Chinese text. If there are still things that dont work, please submit a test case :) |
How can I define a Chinese word dictionary? |
@andyhu yes, the best strategy is to insert a separator into the text before you index it, and then specify that separator when you index it. |
Is there any plan to support chinese language for forage?
The text was updated successfully, but these errors were encountered: