Fine-Tuning the Multilingual Text-To-Text Transfer Transformer (MT5) for Predicting The Language Of The Given Text
This self-contained, reproducible Jupyter notebook includes the code and description of how to fine-tune and evaluate small pre-trained MT5-model (gs://t5-data/pretrained_models/mt5/small) on a new task of predicting the language a given text is written in (using the 15 languages from XNLI dataset).
Please note, that this notebook uses code from the sample Jupyter notebook provided by T5 authors along with several insights and ideas from the notebook by Stephen Mayhew.
I see two modes of using this work. All modes are heavily relying on Google Colaboratory means.
- Following the whole notebook, loading the model from Google Storage, which was finetuned by me, and checking it at inference. Optionally, checking the validity of the result metrics.
- Setting up your personal Google Storage folder and changing BASE_CLOUD_DIR respectively.
Additionally, I have tried to use local Colab runtime to save the model-related files, but it is not allowed due to TPU-specifics.
- It is advised to use Google Chrome if you want to explore Tensorboard interactively during fine-tuning. Also, it looks like this specific browser works very fast with Colab.
- Please log in to any Google Account when the notebook requests it. It is needed for the work with Google Storage, which assists as a data directory for faster loading.
- Google Colab is sometimes failing to provide enough TPUs. I can do nothing with it since the T5 model is based on TPU architecture. The only advice is to try again later.