The makemore-tr
project's aim is to mimic the style and structure of authentic Turkish names with a deep learning model.
The structure of the model is based on following paper A Neural Probabilistic Language Model, 'Bengio et al. 2003'.
Also Andrej Karpathy's youtube video 'Building makemore Part 2: MLP' influences and teaches me a lot on this project.
Lastly, thanks to Kamil Toraman for the raw data.
The project involves three main notebooks:
-
Data Cleaning Notebook (
data-cleaning.ipynb
): This notebook is responsible for cleaning the dataset of Turkish names. It removes duplicates, unwanted characters, and prepares a list of cleaned names. -
Model Training Notebook (
makemore-tr.ipynb
): This notebook builds a character-level language model using PyTorch. It sets up the vocabulary, creates datasets, and trains a neural network model to generate plausible Turkish names. -
Model Training Notebook with Manual Backpropagation (
manual_backprop_tr.ipynb
): In this notebook I done back propagation manually (without using loss.backward()) to gain hard level understanding of backpropagation and gradients.
Ensure you have the following libraries installed on your environment for proper execution of the notebooks:
torch
: For building and training the neural network model.matplotlib
: For plotting and visualizing data during training.
To generate new Turkish names after training the model, simply execute the sampling cell in the manual_backprop_tr
notebook. The model will output a list of new names based on the learned patterns. Sample output may include names like:
cant.
süze.
ergin.
topvar.
erk.
can.
say.
ker.
yıldıralp.
evi.
kara.
dorulhan.
gökmeter.
ağatarakan.
aslan.
serkoç.
nur.
tapdsel.
salkuşa.
yurdu.
These names are generated by the model and aim to mimic the style and structure of authentic Turkish names.