A Pytorch pipeline for Tweet Sentiment Extraction Kaggle competition

Introduction

This is a Pytorch training pipeline for a text span selection task. It also uses the Catalyst deep learning framework.

Installation

You need to have Anaconda installed
Clone the repo

git clone https://github.com/Kirill-Kravtsov/kaggle-tweet-sentiment-extraction

Create and activate provided Anaconda enviroment

conda env create -f tweet_env.yml
conda activate tweet_env

Download competition data and put in data dir in root of the project
Create folds by running

python create_folds.py

Project structure:

├── configs
│   ├── best_bertweet.yml
│   ├── best_roberta.yml
│   ├── experiments
│   └── optimization
├── create_folds.py
├── data
├── logs
├── scripts
├── src
│   ├── callbacks.py
│   ├── collators.py
│   ├── datasets.py
│   ├── data_utils.py
│   ├── hooks.py
│   ├── losses.py
│   ├── optimize_experiment.py
│   ├── tokenization.py
│   ├── train.py
│   ├── transformer_models.py
│   └── utils.py
└── tweet_env.yml

Running pipeline

To train tha basic Roberta and BERTweet models run:

python train.py --cv --config ../configs/best_roberta.yml
python train.py --cv --config ../configs/best_bertweet.yml

Note: the code is supposed to work with one gpu, so if you have multi-gpu system do not forget to specify CUDA_VISIBLE_DEVICE variable, e.g.:

CUDA_VISIBLE_DEVICES=0 python train.py --cv --config ../configs/best_roberta.yml

Name	Name	Last commit message	Last commit date
Latest commit Kirill-Kravtsov clean losses Aug 2, 2020 1281df3 · Aug 2, 2020 History 44 Commits
configs	configs	remove gpt2 from default configs	Jul 30, 2020
scripts	scripts	prepare original2kaggle format script	May 26, 2020
src	src	clean losses	Aug 2, 2020
.gitignore	.gitignore	ignore	May 24, 2020
README.md	README.md	Update README.md	Jul 30, 2020
create_folds.py	create_folds.py	ignore fold	May 21, 2020
tweet_env.yml	tweet_env.yml	upd env	Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Pytorch pipeline for Tweet Sentiment Extraction Kaggle competition

Introduction

Installation

Project structure:

Running pipeline

About

Releases

Packages

Languages

Kirill-Kravtsov/kaggle-tweet-sentiment-extraction

Folders and files

Latest commit

History

Repository files navigation

A Pytorch pipeline for Tweet Sentiment Extraction Kaggle competition

Introduction

Installation

Project structure:

Running pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages