Machine Learning Mastery + Kaggle

Machine Learning Mastery - complete courses python files - combined with Kaggle Courses jupyter files.

Kaggle is the most popular platform for Data Science. It has multiple free datasets, projects that you can use for practice, and competitions. It also has a helpful community where you can share your thoughts and learn new things. But the best feature of Kaggle is Kaggle Learn. Even if you don’t know anything about data science, you can learn all the basics from Kaggle Courses and then move on to sharpening your skills by doing projects.

In this repository, the Kaggle learn course tutorial and excerses nootbooks(.ipynb) are available which I have done and earned completion certificates. The Kaggle datasets are avalable in the inputKaggle folder. The Mastery datasets are avalable in the inputMastery folder. The courses structure is as follows:

Libraries

pip install -r requirements.txt

Complete Courses

Python
Kaggle Courses
Machine Learning Mastery

Python

P01. Python Basics
Functions, Lists, Strings and Dictionaries

P02. Guessing
Guessing the number - User has to guess the number picked by computer

P03. Age
User introduces age (can be decimal) and gets the age in seconds

P04. PriceOfAChair
BeautifulSoup used to download a page and then individual data is obtained from the page

P05. RandomNumbers
Uses numpy pseudorandom number generator to generate random numbers between 1...45, 1...20

P06. Dictionary
Interactive dictionary - uses data.json and displays information about the words introduced. It has similarities with how Large Language Models (LLMs) work : data lookup, fuzzy matching, and user interaction. LLMs are of course more powerfull : use neural network to represent text (not a json file), can find patterns and reason over the input (not just retrieving data) are scalable and can generalize beyond its knowledge base.

Kaggle

K02. Intro To Machine Learning
Starts with DecisionTrees then gets to RandomForest which has the best performance

K03. Pandas
Uses pandas to read Wine data, describe it, fillna and work with columns

K04_z0. Intermediate Machine Learning
Uses 4 RandomForest models from point 2 to train , find best model then generate a submission

K04_z1. Housing Prices Competition
Compare DecisionTreeRegressor with RandomForest model - the best is RandomForest then generate a submission

K04_z2. Pipelines
Pipelines are a simple way to keep your data preprocessing and modeling code organized. Specifically, a pipeline bundles preprocessing and modeling steps so you can use the whole bundle as if it were a single step.

K04_z3. XGBoost
Gradient boosting is a method that goes through cycles to iteratively add models into an ensemble. We use the loss function to fit a new model that will be added to the ensemble. Specifically, we determine model parameters so that adding this new model to the ensemble will reduce the loss.

K05_z0. Data Vizualization
Seaborn, Line Charts, Custom Styles, Heat Maps

K05_z1. Breast Cancer Detection
Histograms for benign and maligant tumors, KDE plots

K06. Feature Engineering
Features, Clustering with K-Means, Principal Component Analysis

K07. Data Cleaning
Minmax_scaling, Normalization, Remove trailing white spaces, fuzzywuzzy closest match

K08. Intro to Deep Learning
Activation Layer, relu, Plot

K09. KerasGradient
Preprocessor, Transformer, Added loss and optimizer, Plot

K10. KerasUnderfitOverfit
Do a "Grouped" split to keep all of an artist's songs in one split or the other - prevents signal leakage. Simple Network - linear model underfit. Added three hidden layers - overfit. Added early stopping callback.

K11. BinaryClassification
In Regression, MAE = distance between the expected outcome and the predicted outcome. In Classification Cross-Entropy = distance between probabilities. Sigmoid activation - covert the real-valued outputs produced by a dense layer into probabilities.

K12. IntroToSQL
BigQuery, Stackoverflow, posts_questions INNER JOIN posts_answers

K13. AdvancedSQL
BigQuery UNION, Analytic Functions, Nested and Repeated Data, Efficient Queries

Machine Learning Mastery

M000. Notes

M001. Probability
Gaussian Distribution, Bayes, cross entropy H(P, Q), Naive classifier, Log Loss, Brier score

M002. Statistics
Gaussian Distribution and Descriptive Stats, Pearsons correlation, Statistical Hypothesis Tests, Nonparametric Statistics

M003. Linear Algebra
Vectors, Multiply vectors, Matrix, Transpose Matrix, Invert Matrix, Matrix decomposition, Singular-value decomposition, Eigen decomposition

M004. Optimization
Basin Hopping Optimization, Multimodal Optimization With Multiple Global Optima, Gradient Descent, Gradient Descent Graph, Grid Search,

M005. Python Machine Learning
Classification and regression trees, DecisionTreeClassifier, line plot, bar chart, histogram, box and whisker plot, scatter plot

M006. Python Project Iris
Box and whisker plots, Histograms, Scatter plot matrix, Split-out validation dataset, Spot Check Algorithms, Make predictions an evaluate them

M007. Machine Learning Mini Course
Pima Indians diabetes, Scatter Plot Matrix, Standardize data (0 mean, 1 stdev), Cross Validation - Evaluate and LogLoss, KNN Regression, Grid Search for Algorithm Tuning, Random Forest Classification, Save Model Using Pickle

M008. Time Series Forecasting
Data Visualization, Persistence Forecast Model, Autoregressive Forecast Model, ARIMA Forecast Model,

M009. Time Series End To End
Test Harness, Persistence, Data Analysis, ARIMA Models, Model Validation, Make Prediction, Validate Model

M010. Time Series End To End Joker
Test Harness, Persistence, Data Analysis, ARIMA Models, Model Validation, Make Prediction, Validate Model

M011. Data Preparation
Fill Missing Values With Imputation, Select Features With RFE, Scale Data With Normalization, Transform Categories With One-Hot Encoding, Transform Numbers to Categories With kBins, Dimensionality Reduction With PCA,

M012. Gradient Boosting
Monitor Performance and Early Stopping, Feature Importance with XGBoost, XGBoost Hyperparameter Tuning,

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
inputKaggle		inputKaggle
inputMastery		inputMastery
J00_ExpectedValue.py		J00_ExpectedValue.py
J02_Update.py		J02_Update.py
K02_introML.ipynb		K02_introML.ipynb
K03_pandas.ipynb		K03_pandas.ipynb
K04_z0_intermediateML.ipynb		K04_z0_intermediateML.ipynb
K04_z1_HousingPricesCompetition.ipynb		K04_z1_HousingPricesCompetition.ipynb
K04_z2_pipeline.ipynb		K04_z2_pipeline.ipynb
K04_z3_xgboost.ipynb		K04_z3_xgboost.ipynb
K05_z0_data_visualization.ipynb		K05_z0_data_visualization.ipynb
K05_z1_breast_cancer_detection.ipynb		K05_z1_breast_cancer_detection.ipynb
K06_feature_engineering.ipynb		K06_feature_engineering.ipynb
K07_data_cleaning.ipynb		K07_data_cleaning.ipynb
K08_introDeepLearning.ipynb		K08_introDeepLearning.ipynb
K09_KerasGradient.ipynb		K09_KerasGradient.ipynb
K10_KerasUnderfitOverfit.ipynb		K10_KerasUnderfitOverfit.ipynb
K11_BinaryClassification.ipynb		K11_BinaryClassification.ipynb
K12_IntroToSQL.ipynb		K12_IntroToSQL.ipynb
K13_AdvancedSQL.ipynb		K13_AdvancedSQL.ipynb
K14_ComputerVision.ipynb		K14_ComputerVision.ipynb
K15_TimeSeries.ipynb		K15_TimeSeries.ipynb
M001_Probability.py		M001_Probability.py
M002_Statistics.py		M002_Statistics.py
M003_LinearAlgebra.py		M003_LinearAlgebra.py
M004_Optimization.py		M004_Optimization.py
M005_Python_ML.py		M005_Python_ML.py
M006_Python_Project_Iris.py		M006_Python_Project_Iris.py
M007_ML_Mini_Course.py		M007_ML_Mini_Course.py
M008_Time_Series_Forecasting.py		M008_Time_Series_Forecasting.py
M009_Time_Series_EndToEnd.py		M009_Time_Series_EndToEnd.py
M010_Time_Series_EndToEnd_Joker.py		M010_Time_Series_EndToEnd_Joker.py
M011_Data_Preparation.py		M011_Data_Preparation.py
M012_Gradient_Boosting.py		M012_Gradient_Boosting.py
MachineLearningMasteryNotes.txt		MachineLearningMasteryNotes.txt
P01_Python_Basics.py		P01_Python_Basics.py
P02_Guessing.py		P02_Guessing.py
P03_Age.py		P03_Age.py
P04_PriceOfAChair.py		P04_PriceOfAChair.py
P05_RandomNumbers.py		P05_RandomNumbers.py
P06_Dictionary.py		P06_Dictionary.py
README.md		README.md
finalized_model.sav		finalized_model.sav
model.pkl		model.pkl
model_bias.npy		model_bias.npy
requirements.txt		requirements.txt
submission.csv		submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Mastery + Kaggle

Libraries

Complete Courses

Python

Kaggle

Machine Learning Mastery

About

Releases

Packages

Languages

bbsoft0/ML_Mastery

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Mastery + Kaggle

Libraries

Complete Courses

Python

Kaggle

Machine Learning Mastery

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages