- Adult Data Set: Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
Link: https://archive.ics.uci.edu/ml/datasets/Adult
A good reference for this dataset: https://github.com/PAIR-code/facets (see Facets Dive).
- THE MNIST DATABASE of handwritten digits.
Link:: http://yann.lecun.com/exdb/mnist/
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998.
DTCPY: Decision tree classifier write from scratch in Python 3 using Jupyter Notebook.
RFCPY: Random forest classifier write from scratch in Python 3 using Jupyter Notebook.
RFRPY: Random forest regressor with SciKit-Learn and write from scratch in Python 3 using Jupyter Notebook. Based on Random Forest in Python A Practical End-to-End Machine Learning Example (William Koehrsen). Link:: https://towardsdatascience.com/random-forest-in-python-24d0893d51c0
RFCLP: Random forest classifier write from scratch in Lisp. With pruning step and quantization of numeric features in feature space.
RFRLP: Random forest regressor write from scratch in Lisp. Using a improved version of the algorthm from RFRPY.
PhD dissertation, Gilles Louppe, July 2014. Defended on October 9, 2014.
arXiv: http://arxiv.org/abs/1407.7502
A good article about Random Forest and feature importance: How not to use random forest.
An Implementation and Explanation of the Random Forest in Python.
I strongly recommend this one first: The Simple Math behind 3 Decision Tree Splitting criterions.