"An existential problem for any major website today is how to handle toxic and divisive content"
Goal is to weed out insincere questions, that is, those founded upon false premises or that intend to make a statement rather than look for helpful answers The competition states that submissions will be evaluated upon F1 Score between predicted and observed targets
Results from training and testing different model configurations can be found in jupyter notebook, /notebooks/Quora_InsincereQuestionDetection.ipynb
My goal here is simply testing different methods for designing and training models, as well as to familiarize myself with TF framework.
I investigate with the following topics:
- Resampling Methods for Minority class classification:
- Word Embeddings and Transfer Learning
- Text Pre-processing, in the context of using word embeddings
- Cost Function
- Model Architecture
- Model Hyperparameters
- Training Specifications
- Overfitting Considerations:
├── main.py
├── README.md
├── requirements.txt
├── notebooks
│ └── Quora_InsincereQuestionDetection.ipynb
|
├── data
│ ├── saved_models
│ │ └── mriv_model0_exp0.h5
│ ├── tensorboard_output
│ └── train_log
│ │ └── mriv_model0_exp0
|
└── src
├── configs
│ └── configs.py
|
├── data
│ └── process_data.py
|
├── models
│ ├── architecture.py
│ ├── compile.py
│ ├── evaluate.py
│ └── train.py
|
└── preprocess
│ ├── resample.py
│ └── text_process.py