Skip to content

IMDB sentiment analysis, Transformer, ULMFiT, attention, TextCNN.

Notifications You must be signed in to change notification settings

PrideLee/sentiment-analysis

Repository files navigation

Sentiment analysis

1. Introduction

  This project is about movive reviews sentiment analysis based on Transformer and ULMFiT model.

You can browse the full report from here.

2. Transformer

  To solve the long-term independence and reduce the computation, Google designs a new model in ML tasks, named Transformer. About this model's detalied introduction please refer my reporters.

2.1 Enviroment

  Python3.6.0 + Pytorch 1.0.1. (Some other python and pytorch version should also be used, but some fuctions and libraries may have a little bit difference, we coding in python 3.6 and pytorch 1.0.1)

2.2 Usage

  • Please run train.py.

    You can also change the parameters, e.g. batch_size=64, learning_rate=0.001, epoches=50, etc. The resultes, training model and processing data will be saved in the folder, you can assign the path by changing the saye_path parameter.

  • IMDB dataset and GloVe wording vectory will be download in the "root" path.

  • dataload.py will processing (wmbedding, tokenize, etc.) raw data.

  • model.py define and design the transformer netwoek.

2.3 Results

  The loss mean at each epoch (50 epoches total) in training data and the accuracy of verification data (every 5 epoch) will be saved in "root/results" path. The best training model will also be saved.

图1.Training loss and vertification accuray

  After 50 epoches the training loss=0.161125, the varcification accuracy=88.036%.

3. ULMFiT

  ULMFiT model is introduced the pre-training and fine-tuning strategy to text classification tasks, we pre-train a general model in wiki-103 dataset and fine-tuning it on IMDB dataset, then training a classfication about sentiment analysis. There are some pre-training tricks presented in this paper. More detail about ULMFiT please refer my notebook.

3.1 Enviroment

  Python 3.6.0 + Pytorch 1.0.1 + Fastai 1.0.51.

3.2 Usage

  • Run ULMFiT_slim.py, you can assign the path to save the trained model and IMDB dataset. The processing data.csv (train and test set will be created randomly) will also be saved. You also can assign the batch_size, learning_rate, dropout etc.
  • There is an other version to reliaze this model we can refere here.

3.3 Results

epoch train_loss valid_loss accuracy time
10 0.882431 0.765422 0.901345 4:21:53

More expirement results

4. Reference

[1] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.

[2] Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).

About

IMDB sentiment analysis, Transformer, ULMFiT, attention, TextCNN.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages