Web implementation : https://dr-dash.herokuapp.com/
This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The initial dataset provided by Figure Eight contain real messages sent during disaster events and their respective categories. The aim of the project is to build a Natural Language Processing tool that categorize messages.
1. Data Processing, ETL Pipeline to extract data from source, clean data and save them in a proper databse structure
-
- train_classifier.py : trains the classifier (LinearSVC). Run in root directory by - 'python train_classifier.py data/DisasterResponse.db models/classifier.pkl'.
- dash_app.py : Dash/Plotly web visualization - python dash_app.py.
- nltk.txt : nltk downloads for Heroku implementation.
- Procfile : file for Heroku implementation.
- requirements.txt : required libraries.
-
- DisasterResponse.db : Database with cleaned data created by 'process_data.py' script.
- process_data.py : Cleans the raw .csv data and saves it into SQLite database (DisasterResponse.db : table - DisasterMessageETL)
-
- classifier.pkl : trained classifier created by 'train_classifier.py' script
-
- custom_tokens.py : contains custom tokenization function
-
Heroku deployment: https://dr-dash.herokuapp.com/
-
-
Text input control: Enter message for classification
-
fig.1 : Bar chart with predicted categories.
-
-
-
Chart 1 : Most common words associated with disaster messages.
-
Chart 2 : Most common words associated with non-disaster messages.
-
Chart 3 : Percentage positive labels per category.
-
Chart 4 : Pie Chart. Message distribution by genre/related to disaster.
-
Slider : Select how many top words to include into Chart 1/2.
-
-
Todor Mishinev - [email protected] Project link - https://github.com/tmishinev/dr_dash.git