Sentiment analysis

1. Introduction

This project is about movive reviews sentiment analysis based on Transformer and ULMFiT model.

You can browse the full report from here.

2. Transformer

To solve the long-term independence and reduce the computation, Google designs a new model in ML tasks, named Transformer. About this model's detalied introduction please refer my reporters.

2.1 Enviroment

Python3.6.0 + Pytorch 1.0.1. (Some other python and pytorch version should also be used, but some fuctions and libraries may have a little bit difference, we coding in python 3.6 and pytorch 1.0.1)

2.2 Usage

Please run train.py.

You can also change the parameters, e.g. batch_size=64, learning_rate=0.001, epoches=50, etc. The resultes, training model and processing data will be saved in the folder, you can assign the path by changing the saye_path parameter.
IMDB dataset and GloVe wording vectory will be download in the "root" path.
dataload.py will processing (wmbedding, tokenize, etc.) raw data.
model.py define and design the transformer netwoek.

2.3 Results

The loss mean at each epoch (50 epoches total) in training data and the accuracy of verification data (every 5 epoch) will be saved in "root/results" path. The best training model will also be saved.

图1.Training loss and vertification accuray

After 50 epoches the training loss=0.161125, the varcification accuracy=88.036%.

3. ULMFiT

ULMFiT model is introduced the pre-training and fine-tuning strategy to text classification tasks, we pre-train a general model in wiki-103 dataset and fine-tuning it on IMDB dataset, then training a classfication about sentiment analysis. There are some pre-training tricks presented in this paper. More detail about ULMFiT please refer my notebook.

3.1 Enviroment

Python 3.6.0 + Pytorch 1.0.1 + Fastai 1.0.51.

3.2 Usage

Run ULMFiT_slim.py, you can assign the path to save the trained model and IMDB dataset. The processing data.csv (train and test set will be created randomly) will also be saved. You also can assign the batch_size, learning_rate, dropout etc.
There is an other version to reliaze this model we can refere here.

3.3 Results

epoch	train_loss	valid_loss	accuracy	time
10	0.882431	0.765422	0.901345	4:21:53

More expirement results

4. Reference

[1] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.

[2] Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
fast_ai_ULMFiT		fast_ai_ULMFiT
transformer		transformer
Different Deep Learning Models Applied to Sentiment Analysis.pdf		Different Deep Learning Models Applied to Sentiment Analysis.pdf
TextClassification.zip		TextClassification.zip
readme.md		readme.md
results.png		results.png
results_more.png		results_more.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment analysis

1. Introduction

2. Transformer

2.1 Enviroment

2.2 Usage

2.3 Results

3. ULMFiT

3.1 Enviroment

3.2 Usage

3.3 Results

More expirement results

4. Reference

About

Releases

Packages

Languages

PrideLee/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis

1. Introduction

2. Transformer

2.1 Enviroment

2.2 Usage

2.3 Results

3. ULMFiT

3.1 Enviroment

3.2 Usage

3.3 Results

More expirement results

4. Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages