Vietnamese handwritten text recognition system
conda env create -f environment.yml
Note: Default --max_epochs
of Pytorch Lightning is 100
and not use GPU, then 2 following option might always be used
To train Transformer (see config params at model/model_tf.py
):
python train.py tf config/base.yaml --gpus -1 --max_epochs 50 --deterministic True
To train RNN (see config params at model/model_rnn.py
):
python train.py rnn config/base.yaml --gpus -1 --max_epochs 50 --deterministic True
Example train TF
python train.py tf config/base.yaml --gpus -1 --max_epochs 50 --deterministic True --attn_size 512 --dim_feedforward 4096 --encoder_nlayers 2 --decoder_nlayers 2 --seed 9498 --decoder_nlayers 2 --stn --pe_text --pe_image
See Pytorch Lightning Trainer config at: Pytorch Lightning Doc. Some useful flags:
--fast_dev_run True # Run 1 batch on train, 1 batch on val to debug
--profiler True # Log time (might slow)
--max_epochs 50 # Train for 50 epochs
--resume_from_checkpoint [PATH_TO_CKPT_FILE] # Resume training from checkpoint
--deterministic True # Reproducible
python test.py {tf, rnn} CKPT_FILE
Run jupyter:
jupyter lab
Open respective notebooks for further visualization
- Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
- Image Captioning
- DeepSpeech
- Seq2Seq-Pytorch
If you found the code useful, please cite our paper.
@INPROCEEDINGS{9335877,
author={Ly, Vinh-Loi and Doan, Tuan and Ly, Ngoc Quoc},
booktitle={2020 7th NAFOSTED Conference on Information and Computer Science (NICS)},
title={Transformer-based model for Vietnamese Handwritten Word Image Recognition},
year={2020},
volume={},
number={},
pages={163-168},
doi={10.1109/NICS51282.2020.9335877}}