Skip to content

A sentiment analysis model trained on the IMDb Movie Reviews Dataset to classify reviews as positive or negative. This project uses a Bidirectional LSTM with GloVe embeddings, batch normalization, and regularization to improve accuracy and generalization. Includes data preprocessing, model training, and evaluation.

License

Notifications You must be signed in to change notification settings

sminerport/IMDbSentimentClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDb Sentiment Classifier

This project is a sentiment classifier for IMDb movie reviews. It uses a pre-trained GloVe word embedding model and a Bidirectional LSTM network to classify reviews as positive or negative.

Features

Setup

  1. Clone this repository:

    git clone https://github.com/sminerport/IMDbSentimentClassifier.git
    cd IMDbSentimentClassifier
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the model:

    python src/main.py

Data

The model uses IMDb review data split into training, validation, and test sets. These files are stored in the data/ directory and are managed with Git Large File Storage (Git LFS) to optimize storage and download efficiency.

To ensure access to the data files, please install Git LFS if you haven’t already. You can download Git LFS here.

# Install Git LFS
git lfs install

Then, clone the repository as usual:

git clone https://github.com/sminerport/IMDbSentimentClassifier.git
cd IMDbSentimentClassifier

If you’ve already cloned the repository without Git LFS, run the following command to pull the LFS files:

git lfs pull

Usage

To train the model:

python src/main.py

After running, the script will automatically download and clean up GloVe embeddings to save space.

Model Training Output

Below is a snapshot of the model's training and validation accuracy and loss across epochs:

Model Training Output

This image provides a visual summary of the training process. Each epoch displays the model's accuracy and loss on both the training and validation sets, showing the progression as the model improves over time.

Cleanup

The script will delete the GloVe embeddings and the saved model (best_model.keras) after evaluation to conserve storage. If you'd like to keep these files, set the cleanup variable to False in the script.

Notes

  • To adjust storage usage, toggle the cleanup variable in the script.
  • requirements.txt is generated by running pip freeze > requirements.txt in a Colab environment or your local environment.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

A sentiment analysis model trained on the IMDb Movie Reviews Dataset to classify reviews as positive or negative. This project uses a Bidirectional LSTM with GloVe embeddings, batch normalization, and regularization to improve accuracy and generalization. Includes data preprocessing, model training, and evaluation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages