nlp 2022

Code-first, hand-on approach

presenting a specific NLP topics using Notebook environment (e.g: Colab)
using most popular NLP libararies and tools

Topics

First Session (Nov 4 - 5)

Introduction (1h)
- Popular use cases
- Philosophical debate about how to model and improve language learning
Sentiment Analysis (scikit-learn + fast.ai) (2.5 h)
- Dataset - IMDB reviews
- Cloud Service example - AWS Comprehend
- bag-of-word approach
- Naive Bayes - using frequency counts
Topic Modeling - LSA/LDA

Second Session (Nov 18 - 19)

Block 1

Introductions
Recap 
	- use cases of NLP (Notion AI, Aircall voicemail transcription)
	- exercise: sentiment analysis of imdb movie reviews
	- solution tiers: 
		- bag-of-words models (Naicve Bayes, Logistic Regression)
		- sequential models (Language Models, RNN, Transformers)
Evaluation - theory
	- accuracy, False Positives/False Negatives
	- F1 score, ROC - AUC
	- sklearn DummyClassifier
Recap
	- Naive Bayes method for sentiment analysis

Block 2

Vectorization and GPUs - theory
Linear regression - theory
	- interactive website
Logistic Regression - notebook
	- sigmoid, log-loss
	- term-dcoument matrix, sparse matrix
	- tokenization and vocabulary
	- N-Grams
Talk to books - by google - NLP use case

Block 3

Recap 
	- Logistic Regression method for sentiment analysis
Sequential models - theory
	- RNN
	- Attention mechanism
	- Transformers
Word Vector/Word Embeddings - theory and intution
Word Embeddings - notebook

Block 4

Language Models - theory
Pre-trained transformer fine-tuning - notebook
	- tokenization
	- hidden state extraction
	- feature matrix and Logistic Regression
	- HuggingFace pipelines - high level library

Whisper - notebook
	- open source ASR
	- record voice - transcribe - translate
	- translation
	- Why is this model so good?
		- trained on multiple related tasks translation-transcription
		- trained on multiple languages

Third Session (Dec 9 - 10)

Zoltán és Orsolya

NLP libraries

NLTK - released in 2001, very broud NLP tasks
spacy - opinionated, parse trees, tokenizers
gensim - topic modeling, similarity detection
huggingface - transformer based models
fastText - text classification and representation learning
scikit-learn - general purpose ML library
PyText - deep learning framework for NLP (based on pyTorch)
fast.ai - deep learning library (based on pyTorch)

Might be useful

https://towardsdatascience.com/how-to-compute-sentence-similarity-using-bert-and-word2vec-ab0663a5d64
https://towardsdatascience.com/semantic-textual-similarity-83b3ca4a840e https://www.kdnuggets.com/2022/11/getting-started-spacy-nlp.html

Drive folder - notebook & text files

https://drive.google.com/drive/folders/11FdU2aY3atYFES2gSnukyKBLEwPYeRAo?usp=sharing

Links

AntConc / MSNLP / Magyar Nemzeti Szovegtar / MazSola

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Aircall_Tech_Quicky		Aircall_Tech_Quicky
MCC_2021_Nov		MCC_2021_Nov
MCC_2024_Apr		MCC_2024_Apr
Bias & Fairness in ML.pdf		Bias & Fairness in ML.pdf
Copy_of_Copy_of_2_Clustering_KMeans_TimeSeries.ipynb		Copy_of_Copy_of_2_Clustering_KMeans_TimeSeries.ipynb
ML_vectorization.ipynb		ML_vectorization.ipynb
NLP_Debates.pdf		NLP_Debates.pdf
NLP_sent_bayes.ipynb		NLP_sent_bayes.ipynb
NLP_sent_bayes_v2.ipynb		NLP_sent_bayes_v2.ipynb
NLP_sent_linreg.ipynb		NLP_sent_linreg.ipynb
NLP_sent_transformer.ipynb		NLP_sent_transformer.ipynb
NLP_sentiment.pdf		NLP_sentiment.pdf
NLP_workshop_Word_Vectors(1).ipynb		NLP_workshop_Word_Vectors(1).ipynb
NLP_workshop_Word_Vectors.ipynb		NLP_workshop_Word_Vectors.ipynb
README.md		README.md
Twitter sentiment.ipynb		Twitter sentiment.ipynb
Whisper_playground.ipynb		Whisper_playground.ipynb
w2v.ipynb		w2v.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nlp 2022

Topics

First Session (Nov 4 - 5)

Second Session (Nov 18 - 19)

Third Session (Dec 9 - 10)

NLP libraries

Might be useful

Drive folder - notebook & text files

Links

About

Releases

Packages

Contributors 2

Languages

csaladenes/course-nlp

Folders and files

Latest commit

History

Repository files navigation

nlp 2022

Topics

First Session (Nov 4 - 5)

Second Session (Nov 18 - 19)

Third Session (Dec 9 - 10)

NLP libraries

Might be useful

Drive folder - notebook & text files

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages