The following repository contains all the implementation and explanation, step by step, of a transformer from the paper "Attention is all you need" (https://arxiv.org/abs/1706.03762)
You can find the explanation inside notebooks/vit.ipynb. And, to run the training and evaluation of the MNIST dataset, using transformers, you can run app.py using the environment.yml requirements to the virtual environment.