Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I limit the vocabulary? #51

Open
jiangfeng1124 opened this issue Apr 2, 2014 · 2 comments
Open

Can I limit the vocabulary? #51

jiangfeng1124 opened this issue Apr 2, 2014 · 2 comments

Comments

@jiangfeng1124
Copy link

Dear developers,

I did not find an option to limit the vocabulary. For example, I don't want to learn representations for words which occurs less than 50 in my corpus.
The reason is that if I use all the words (or exclude the stop words), the vocabulary will be very large, which is undesired.

I am wondering whether there is a convenient way for doing this?
Thanks very much,
Jiang

@davidjurgens
Copy link
Collaborator

Hi Jiang,

You'll need to compute the words you want to use first and then use the
--token-filter option to restrict which words are retained.

Also, please use the mailing list for these types of questions, rather
than opening a new issue on Github for each question. The mailing list
helps others see the answers in case they have the same question.

Thanks,
David

On Wed, Apr 2, 2014 at 11:14 AM, jiangfeng [email protected] wrote:

Dear developers,

I did not find an option to limit the vocabulary. For example, I don't
want to learn representations for words which occurs less than 50 in my
corpus.
The reason is that if I use all the words (or exclude the stop words), the
vocabulary will be very large, which is undesired.

I am wondering whether there is a convenient way for doing this?
Thanks very much,
Jiang

Reply to this email directly or view it on GitHubhttps://github.com//issues/51
.

@jiangfeng1124
Copy link
Author

I see, thanks very much!

Jiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants