Can I limit the vocabulary? #51

jiangfeng1124 · 2014-04-02T16:14:25Z

Dear developers,

I did not find an option to limit the vocabulary. For example, I don't want to learn representations for words which occurs less than 50 in my corpus.
The reason is that if I use all the words (or exclude the stop words), the vocabulary will be very large, which is undesired.

I am wondering whether there is a convenient way for doing this?
Thanks very much,
Jiang

davidjurgens · 2014-04-02T16:51:32Z

Hi Jiang,

You'll need to compute the words you want to use first and then use the
--token-filter option to restrict which words are retained.

Also, please use the mailing list for these types of questions, rather
than opening a new issue on Github for each question. The mailing list
helps others see the answers in case they have the same question.

Thanks,
David

On Wed, Apr 2, 2014 at 11:14 AM, jiangfeng [email protected] wrote:

Dear developers,

I did not find an option to limit the vocabulary. For example, I don't
want to learn representations for words which occurs less than 50 in my
corpus.
The reason is that if I use all the words (or exclude the stop words), the
vocabulary will be very large, which is undesired.

I am wondering whether there is a convenient way for doing this?
Thanks very much,
Jiang

Reply to this email directly or view it on GitHubhttps://github.com//issues/51
.

jiangfeng1124 · 2014-04-03T02:57:35Z

I see, thanks very much!

Jiang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I limit the vocabulary? #51

Can I limit the vocabulary? #51

jiangfeng1124 commented Apr 2, 2014

davidjurgens commented Apr 2, 2014

jiangfeng1124 commented Apr 3, 2014

Can I limit the vocabulary? #51

Can I limit the vocabulary? #51

Comments

jiangfeng1124 commented Apr 2, 2014

davidjurgens commented Apr 2, 2014

jiangfeng1124 commented Apr 3, 2014