How to use other data? How to create vocab.txt file? #2

wrapperband · 2018-04-24T10:26:33Z

How to use other data? How to create vocab.txt file?

The program crashed / stalled my PC after about 8 hours creating the training. How ever it was using CPU, so I tried to create a smaller data set.

I assumed : https://github.com/lancopku/DPGAN/blob/master/review_generation_dataset/generate_review.py is what formats the data.

I've being trying to read this program, I was / am hoping it formats the data some way, but there aren't any comments for a "non coder" to follow. I assumed I had to change the path? I'm on Linux.

generate_review.py
L52 : file_path = "F:\dataset\yelp_dataset\sorted_data"

johndpope · 2018-06-04T14:44:35Z

pretty sure this is just a word2vec model - see here for training

The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training is finished, the user can interactively explore the similarity of the words.

More information about the scripts is provided at https://code.google.com/p/word2vec/

https://github.com/dav/word2vec

akhileshkumargangwar · 2018-08-20T04:33:38Z

Hi I am still unable to understand how vocab.txt created and why many words assigned same integer value?

jklj077 · 2018-08-20T06:29:17Z

@wrapperband

Yes, changing the path should work. The path should point to a directory that contians all the review files, which should be json files.

The script for generating vocab.txt is not released. But the format is quite simple. vocab.txt contains the word list for indexing. It is not an embedding file. Each line of vocab.txt contains (1) the lowered word and (2) its frequency in the training text, i.e., how many times it appears in the training text. The words are ranked by frequency so that the common words are in the front and the rare words are in the back.

Best regards

akhileshkumargangwar · 2018-08-20T12:13:18Z

Thank You.

johndpope · 2018-08-20T12:14:21Z

This unrelated code maybe able to be cherry picked - see the python code https://github.com/johndpope/vocab-mashup - it’s pretty impressive the smashing of text together. Can help augment training sets.

akhileshkumargangwar · 2018-08-20T12:16:26Z

Thanks I will check

akhileshkumargangwar · 2018-08-22T06:22:41Z

Hi,
This DP-GAN code is showing lots of error. In discriminator_test/negative/*.txt not generating review.It is giving empty review . I want to learn the flow of GAN by debugging but it is taking lots of time to fix the error. Is there any other updated code. I also tried SeqGAN but they have used synthetic data. So please help me. I am unable to fix some errors also.
Thanks

jingjingxupku · 2018-08-23T12:54:17Z

I do not meet your problem on my local datasets. I guess this problem is mainly attributed to the small training data. I just released a small subset of dataset for illustrating data format on current codes. Since the default epoch of training generator is set to 1, the generator learns nothing on this small dataset. Therefore, I increased the training epochs and this problem was fixed successfully. I have updated my latest codes, so please download it again. Furthermore, I released the whole dataset in google drive, you can download it from readme.md.

akhileshkumargangwar · 2018-08-23T13:08:13Z

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use other data? How to create vocab.txt file? #2

How to use other data? How to create vocab.txt file? #2

wrapperband commented Apr 24, 2018

johndpope commented Jun 4, 2018 •

edited

Loading

akhileshkumargangwar commented Aug 20, 2018

jklj077 commented Aug 20, 2018

akhileshkumargangwar commented Aug 20, 2018

johndpope commented Aug 20, 2018

akhileshkumargangwar commented Aug 20, 2018

akhileshkumargangwar commented Aug 22, 2018

jingjingxupku commented Aug 23, 2018 •

edited

Loading

akhileshkumargangwar commented Aug 23, 2018

How to use other data? How to create vocab.txt file? #2

How to use other data? How to create vocab.txt file? #2

Comments

wrapperband commented Apr 24, 2018

johndpope commented Jun 4, 2018 • edited Loading

akhileshkumargangwar commented Aug 20, 2018

jklj077 commented Aug 20, 2018

akhileshkumargangwar commented Aug 20, 2018

johndpope commented Aug 20, 2018

akhileshkumargangwar commented Aug 20, 2018

akhileshkumargangwar commented Aug 22, 2018

jingjingxupku commented Aug 23, 2018 • edited Loading

akhileshkumargangwar commented Aug 23, 2018

johndpope commented Jun 4, 2018 •

edited

Loading

jingjingxupku commented Aug 23, 2018 •

edited

Loading