AISI

Concepts:

Transformers
Shallow heuristics
Narrow training distribution

Points of interest:

Ability to 'reason'
Different steps of training
Transfer learning

TinyStories

LMs: factual knowledge + reasoning + contextual tracking.

Synthetic dataset generated by GPT-3.5 and GPT-4

useful to train and evaluate Language Models (LMs) smaller than the state-of-the-art models (below 10 million total parameters (SLMs)) or have much simpler architectures (with only one transformer block).
demonstrates reasoning capabilities.

Suggestion of a new framework using GPT-4

multidimensional score for the model (grammar, creativity, instruction-following) =/= from very structured standard benchmarks.
useful for low-resource / specialized domains teams + provides a new perspective on the capabilities of LMs

Computational efficiency and behavior of SLMs

generative models trained on TinyStories show similar behaviors to Larger Language Models (LLMs).
conducting extensive experiments on different hyperparameters, architectures, and training methods reveals insights into the performance and quality of these models even with limited computational resources.

Improved interpretabilty of SLMs

models trained on TinyStories appear to be substantially more interpretable than larger ones, with clear attention patterns and meaningful neuron activations.
visualization and analysis of attention and activation maps provide insights into the generation process and story content, enhancing our understanding of how these models operate.

Comparison with Larger Models

models trained on TinyStories can produce results comparable to much larger models like GPT2-XL, demonstrating the effectiveness of this approach in generating high-quality text.

Concepts

Transformer

introduced in Attention is all you need
neural network architecture primarily used for natural language processing.
key feature: attention mechanism, allowing it to capture complex relationships in sequential data.
excel in tasks like machine translation and text generation.
famous models such as GPT and BERT employ transformer architectures.

Shallow neuristics

simple rules or low-complexity methods used to solve problems or make decisions.
can be quick to apply but may also yield approximate results.

Points of interest

Steps of LMs' training

Data collection.
Preprocessing: cleaning and formatting the text data, including tokenization (breaking text into individual words or subwords), handling special characters, and potentially applying techniques like stemming or lemmatization. a. stemming: reducing words to their root by removing suffixes. e.g. "eating", "eats", "eaten" become "eat". Fast but may produce imperfect results as it does not consider context. b. lemmatization: reducing words to their canonical form. e.g. "better" become "good", "running" become "run".
Token embedding: converting tokens into vectors that can be understood by the model. Involves techniques like word embeddings (e.g., Word2Vec, GloVe) or subword embeddings (e.g., Byte Pair Encoding, SentencePiece).
Model training: involves feeding the tokenized and embedded text into the model and adjusting its parameters (e.g., weights in neural networks) iteratively to minimize prediction errors. a. optimization algorithms: improve the model's ability to generate coherent and contextually relevant text, while minimizing errors and maximizing language understanding.
Evaluation: assessing the performance of the trained model using various metrics such as perplexity, accuracy, or BLEU score (Bilingual Evaluation Understudy).
Fine-tuning (optional): fine-tuning the pre-trained model on a specific task or domain to improve its performance for a particular application.
Deployment.

Transfer learning

transferring a pre-trained model's learned knowledge to a new related task or domain.
pre-trained model is used as a starting point.
allows saving time + resources.

Tasks

Task 1: Create a synthetic dataset

Generate a train (size: 70) & a test dataset (size: 30):

    Dtrain = { (xi, yi): 1 <= i <= 100 ^ (i % 10) \notin { 1, 3, 7 }}
    Dtest = { (xi, yi): 1 <= i <= 100 ^ (i % 10) \in { 1, 3, 7 }}

Difficulty: ⭐ Duration: 30 minutes

! Decode the tokenizer.pad_token to add it to our synthetic completions.

📚 Doc:

Hugging Face 🤗 Create dataset
stackoverflow

Task 2: Evaluate models

Reasons for avoiding generate() function:

Customization of the generation process
Performance
Flexibility: ability to add additional features of custom preprocessing steps to generation process
Control over model + behavior
If there are needs during inference and training stages (generate() can only be used at inference time

How does it work?

Create a dataset_batches where the step=batch_size. Advantages = optimization+parallelism, PyTorch/TensorFlow are optimized for processing batches of data in parallel.
Iterate on batches and for each get prompts/completions.

Gradients activation/desactivation

Activation:
- Training stage -> learn model params
- Calculated + used to adjust model params to minimize the loss on the training data
Desactivation:
- Inference / evaluation stages
- No parameter adjustments are made because the parameters have already been learned during training.

Difficulty: ⭐⭐ Duration: 1h

📚 Doc:

Task 3: Test your evaluator

Init DummyModel class constructor
Implement a customized forward method

Difficulty: ⭐⭐⭐⭐ Duration: 2h

📚 Doc:

Hugging face 🤗 GPT Neo

Task 4: Transfer Learning

Difficulty: ⭐⭐⭐ Duration: 1h

Hyperparameters: set to control the training process + can influence performance of the model. It can be:

learning rate
batch size
number of epochs
optimizer
regularization parameters
dropout rate
model architecture choices: number of layers in a neural network, number of neurons/layer, ...

What I tried:

change learning rate, epochs, batch_size

📚 Doc:

Task 5: Have we tested our hypothesis?

Other experiment

Exploring a task that requires a balance between the nature of the words: nouns, verbs, adjectives, numbers.

Code's improvements

Task 1: OK
Task 2: truncate completions to improve DummyModel's accuracy
Task 3:
- constructor
- search more if forward params are useful
- logits instantiation / initialization / storage
Task 4: develop training stage

Execution of this experiment

See the impact of a different dataset (size & quality)
Test hypothesis with differents SLMs

Doc:

Tiny but mighty: The Phi-3 small language models with big potential

Difficulty: ⭐ Duration: 30 minutes

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AISI

TinyStories

Synthetic dataset generated by GPT-3.5 and GPT-4

Suggestion of a new framework using GPT-4

Computational efficiency and behavior of SLMs

Improved interpretabilty of SLMs

Comparison with Larger Models

Concepts

Transformer

Shallow neuristics

Points of interest

Steps of LMs' training

Transfer learning

Tasks

Task 1: Create a synthetic dataset

Task 2: Evaluate models

How does it work?

Gradients activation/desactivation

Task 3: Test your evaluator

Task 4: Transfer Learning

Task 5: Have we tested our hypothesis?

Other experiment

Code's improvements

Execution of this experiment

Task 6: [Optional] Explore

About

Releases

Packages

Languages

License

JehanneDussert/AI-safety-intro

Folders and files

Latest commit

History

Repository files navigation

AISI

TinyStories

Synthetic dataset generated by GPT-3.5 and GPT-4

Suggestion of a new framework using GPT-4

Computational efficiency and behavior of SLMs

Improved interpretabilty of SLMs

Comparison with Larger Models

Concepts

Transformer

Shallow neuristics

Points of interest

Steps of LMs' training

Transfer learning

Tasks

Task 1: Create a synthetic dataset

Task 2: Evaluate models

How does it work?

Gradients activation/desactivation

Task 3: Test your evaluator

Task 4: Transfer Learning

Task 5: Have we tested our hypothesis?

Other experiment

Code's improvements

Execution of this experiment

Task 6: [Optional] Explore

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages