A Pytorch pipeline for Tweet Sentiment Extraction Kaggle competition
This is a Pytorch training pipeline for a text span selection task. It also uses the Catalyst deep learning framework.
- You need to have Anaconda installed
- Clone the repo
git clone https://github.com/Kirill-Kravtsov/kaggle-tweet-sentiment-extraction
- Create and activate provided Anaconda enviroment
conda env create -f tweet_env.yml
conda activate tweet_env
- Download competition data and put in
data
dir in root of the project - Create folds by running
python create_folds.py
├── configs
│ ├── best_bertweet.yml
│ ├── best_roberta.yml
│ ├── experiments
│ └── optimization
├── create_folds.py
├── data
├── logs
├── scripts
├── src
│ ├── callbacks.py
│ ├── collators.py
│ ├── datasets.py
│ ├── data_utils.py
│ ├── hooks.py
│ ├── losses.py
│ ├── optimize_experiment.py
│ ├── tokenization.py
│ ├── train.py
│ ├── transformer_models.py
│ └── utils.py
└── tweet_env.yml
To train tha basic Roberta and BERTweet models run:
python train.py --cv --config ../configs/best_roberta.yml
python train.py --cv --config ../configs/best_bertweet.yml
Note: the code is supposed to work with one gpu, so if you have multi-gpu system do not forget to specify CUDA_VISIBLE_DEVICE
variable, e.g.:
CUDA_VISIBLE_DEVICES=0 python train.py --cv --config ../configs/best_roberta.yml