Skip to content

Kirill-Kravtsov/kaggle-tweet-sentiment-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1281df3 · Aug 2, 2020

History

44 Commits
Jul 30, 2020
May 26, 2020
Aug 2, 2020
May 24, 2020
Jul 30, 2020
May 21, 2020
Jul 30, 2020

Repository files navigation

Introduction

This is a Pytorch training pipeline for a text span selection task. It also uses the Catalyst deep learning framework.

Installation

  1. You need to have Anaconda installed
  2. Clone the repo
git clone https://github.com/Kirill-Kravtsov/kaggle-tweet-sentiment-extraction
  1. Create and activate provided Anaconda enviroment
conda env create -f tweet_env.yml
conda activate tweet_env
  1. Download competition data and put in data dir in root of the project
  2. Create folds by running
python create_folds.py

Project structure:

├── configs
│   ├── best_bertweet.yml
│   ├── best_roberta.yml
│   ├── experiments
│   └── optimization
├── create_folds.py
├── data
├── logs
├── scripts
├── src
│   ├── callbacks.py
│   ├── collators.py
│   ├── datasets.py
│   ├── data_utils.py
│   ├── hooks.py
│   ├── losses.py
│   ├── optimize_experiment.py
│   ├── tokenization.py
│   ├── train.py
│   ├── transformer_models.py
│   └── utils.py
└── tweet_env.yml

Running pipeline

To train tha basic Roberta and BERTweet models run:

python train.py --cv --config ../configs/best_roberta.yml
python train.py --cv --config ../configs/best_bertweet.yml

Note: the code is supposed to work with one gpu, so if you have multi-gpu system do not forget to specify CUDA_VISIBLE_DEVICE variable, e.g.:

CUDA_VISIBLE_DEVICES=0 python train.py --cv --config ../configs/best_roberta.yml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published