Skip to content

Latest commit

 

History

History
44 lines (28 loc) · 2.11 KB

README.md

File metadata and controls

44 lines (28 loc) · 2.11 KB

Generative pre-trained transformer

GPT architecture

Source

Model Information

GPT is built of a multi-head attention architecture. We offer here a very small instance based on Andrej Karpathy's nanoGPT. The default parameters give a model much smaller than nanoGPT, tuned for fastest convergence on a very small data set (Shakespeare).

This model takes as input a sequence of existing text (context) and produces as output the predicted next character. Actually, it produces the predicted next character for each initial sub-sequence of the input, in effect giving an extra degree of parallelism for the purposes of training.

For the attention mechanism, we use Flux.MultiHeadAttention.

Training

cd text/gpt
julia --project gpt.jl

Example output

After one epoch:

generate(model, "_", 50) = "_me, but plept fairs, And heards, verchean my word"
generate(model, "_", 50) = "_ows know yought, This alce! totether him. weliest"
generate(model, "The", 50) = "These prurd passtion?  CINCESSIT: He eloucy I must"
generate(model, "The", 50) = "The bitherse dresic in to so shall with a his the "

After 20 epochs:

generate(model, "_", 50) = "_ething a calling do me diseases Of, on he's to th"
generate(model, "_", 50) = "_ ragg Thou flatters all in wators the selfsarut o"
generate(model, "The", 50) = "The Mirtouggake Go: For my mischance lords his sea"
generate(model, "The", 50) = "The oll-gakemoremo his dead: All this man make gen"

References