Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organizing International Code #159

Closed
kkoch986 opened this issue May 23, 2014 · 3 comments
Closed

Organizing International Code #159

kkoch986 opened this issue May 23, 2014 · 3 comments

Comments

@kkoch986
Copy link
Member

Hey everyone,

As you can tell, natural is quickly becoming a behemoth of international NLP code. I am starting to feel like all the great effort for algorithms in various languages is not balanced by the quality of their integration into the library.

What I would like to do is come up with a better way to organize international algorithms to make both their use and further development easier.

My first intuition is to have some sort of setLanguage function which will automatically hot-swap any algorithms we can provide in the requested language. This is good because people can just go on using Natural as they always do, but know that when possible algorithms are being used in their preferred language.

The other option I thought of (much less radical/development intensive) is to put each language into its own package, i.e.
natural.es.PorterStemmer as opposed to natural.PorterStemmer_es. The only flaw in this is that higher-level algorithms may rely on lower-level algorithms may not use the correct one.

Either way, i think moving the code around a bit will make life a little easier.

I would love some feedback on what the people creating/using the international algorithms think since I don't use them too often, so please leave your ideas/feedback!

Thanks,
-Ken

@martijndeb
Copy link
Contributor

How about creating an algorithm object mapping algorithms to namespaces.
An algorithm would use natural.algorithms.PorterStemmer which would be aliased to natural.es.PorterStemmer. If you require algorithms_es it could then overwrite it. If it doesn't specify a PorterStemmer the default object could still be the english one.
This way you can namespace everything correctly and it still solves the problem of lower level algorithms picking the wrong class.

@kkoch986
Copy link
Member Author

Yea thats what i think would be best, the challenge is that some of the algorithms directly call require and get the algorithms, which is why we need fixes like #155

I think that type of namespacing solution is what would be best, just have to do a little digging on the best way to implement given what we already have.

-Ken

@kkoch986
Copy link
Member Author

Merging into #228 hopefully we can get it solved there too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants