BasicTopicModelic

Basic topic modeling using Latent Dirichlet Allocation (LDA) in Python.

In this repository I will usa LDA for topic modeling. I will use it over the 20newsgroup dataset from sklearn, which contains 20 targets.

Before applying LDA, it's important to prepare the data, cleaning and preprocesing it. In the Notebook I compare two models, one without cleaning emails or puntuation, and other with full cleaning. In both case I lemmatize the data and not stem it.

Also I try to explore the correct number of topics that fits better with the problem.

The topic distribution across the documents need to be fixed to work as it's supposed.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
topic_modeling_20newsgroups.ipynb		topic_modeling_20newsgroups.ipynb
topic_modeling_20newsgroups.py		topic_modeling_20newsgroups.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BasicTopicModelic

About

Releases

Packages

Languages

AlArgente/BasicTopicModelic

Folders and files

Latest commit

History

Repository files navigation

BasicTopicModelic

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages