This repository contains a minimalist implementation of the Vision Transformer (ViT) model using tinygrad.
The Vision Transformer (ViT) is a model introduced by Google Research that applies transformer architecture to image classification tasks. Unlike traditional convolutional neural networks (CNNs), ViT divides an image into patches and processes them as sequences, similar to words in natural language processing.
To get started, clone this repository and install the required dependencies:
git clone https://github.com/EthanBnntt/tinygrad-vit.git
cd tinygrad-vit
pip install -r requirements.txt
python train.py