Skip to content

Machine Learning Mastery - complete courses python files - combined with Kaggle Courses jupyter files.

Notifications You must be signed in to change notification settings

bbsoft0/ML_Mastery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Mastery + Kaggle

Machine Learning Mastery - complete courses python files - combined with Kaggle Courses jupyter files.

Kaggle is the most popular platform for Data Science. It has multiple free datasets, projects that you can use for practice, and competitions. It also has a helpful community where you can share your thoughts and learn new things. But the best feature of Kaggle is Kaggle Learn. Even if you don’t know anything about data science, you can learn all the basics from Kaggle Courses and then move on to sharpening your skills by doing projects.

In this repository, the Kaggle learn course tutorial and excerses nootbooks(.ipynb) are available which I have done and earned completion certificates. The Kaggle datasets are avalable in the inputKaggle folder. The Mastery datasets are avalable in the inputMastery folder. The courses structure is as follows:

Libraries

pip install -r requirements.txt

Complete Courses

Python

P01. Python Basics
Functions, Lists, Strings and Dictionaries

P02. Guessing
Guessing the number - User has to guess the number picked by computer

P03. Age
User introduces age (can be decimal) and gets the age in seconds

P04. PriceOfAChair
BeautifulSoup used to download a page and then individual data is obtained from the page

P05. RandomNumbers
Uses numpy pseudorandom number generator to generate random numbers between 1...45, 1...20

P06. Dictionary
Interactive dictionary - uses data.json and displays information about the words introduced. It has similarities with how Large Language Models (LLMs) work : data lookup, fuzzy matching, and user interaction. LLMs are of course more powerfull : use neural network to represent text (not a json file), can find patterns and reason over the input (not just retrieving data) are scalable and can generalize beyond its knowledge base.

Kaggle

K02. Intro To Machine Learning
Starts with DecisionTrees then gets to RandomForest which has the best performance

K03. Pandas
Uses pandas to read Wine data, describe it, fillna and work with columns

K04_z0. Intermediate Machine Learning
Uses 4 RandomForest models from point 2 to train , find best model then generate a submission

K04_z1. Housing Prices Competition
Compare DecisionTreeRegressor with RandomForest model - the best is RandomForest then generate a submission

K04_z2. Pipelines
Pipelines are a simple way to keep your data preprocessing and modeling code organized. Specifically, a pipeline bundles preprocessing and modeling steps so you can use the whole bundle as if it were a single step.

K04_z3. XGBoost
Gradient boosting is a method that goes through cycles to iteratively add models into an ensemble. We use the loss function to fit a new model that will be added to the ensemble. Specifically, we determine model parameters so that adding this new model to the ensemble will reduce the loss.

K05_z0. Data Vizualization
Seaborn, Line Charts, Custom Styles, Heat Maps

K05_z1. Breast Cancer Detection
Histograms for benign and maligant tumors, KDE plots

K06. Feature Engineering
Features, Clustering with K-Means, Principal Component Analysis

K07. Data Cleaning
Minmax_scaling, Normalization, Remove trailing white spaces, fuzzywuzzy closest match

K08. Intro to Deep Learning
Activation Layer, relu, Plot

K09. KerasGradient
Preprocessor, Transformer, Added loss and optimizer, Plot

K10. KerasUnderfitOverfit
Do a "Grouped" split to keep all of an artist's songs in one split or the other - prevents signal leakage. Simple Network - linear model underfit. Added three hidden layers - overfit. Added early stopping callback.

K11. BinaryClassification
In Regression, MAE = distance between the expected outcome and the predicted outcome. In Classification Cross-Entropy = distance between probabilities. Sigmoid activation - covert the real-valued outputs produced by a dense layer into probabilities.

K12. IntroToSQL
BigQuery, Stackoverflow, posts_questions INNER JOIN posts_answers

K13. AdvancedSQL
BigQuery UNION, Analytic Functions, Nested and Repeated Data, Efficient Queries

Machine Learning Mastery

M000. Notes

M001. Probability
Gaussian Distribution, Bayes, cross entropy H(P, Q), Naive classifier, Log Loss, Brier score

M002. Statistics
Gaussian Distribution and Descriptive Stats, Pearsons correlation, Statistical Hypothesis Tests, Nonparametric Statistics

M003. Linear Algebra
Vectors, Multiply vectors, Matrix, Transpose Matrix, Invert Matrix, Matrix decomposition, Singular-value decomposition, Eigen decomposition

M004. Optimization
Basin Hopping Optimization, Multimodal Optimization With Multiple Global Optima, Gradient Descent, Gradient Descent Graph, Grid Search,

M005. Python Machine Learning
Classification and regression trees, DecisionTreeClassifier, line plot, bar chart, histogram, box and whisker plot, scatter plot

M006. Python Project Iris
Box and whisker plots, Histograms, Scatter plot matrix, Split-out validation dataset, Spot Check Algorithms, Make predictions an evaluate them

M007. Machine Learning Mini Course
Pima Indians diabetes, Scatter Plot Matrix, Standardize data (0 mean, 1 stdev), Cross Validation - Evaluate and LogLoss, KNN Regression, Grid Search for Algorithm Tuning, Random Forest Classification, Save Model Using Pickle

M008. Time Series Forecasting
Data Visualization, Persistence Forecast Model, Autoregressive Forecast Model, ARIMA Forecast Model,

M009. Time Series End To End
Test Harness, Persistence, Data Analysis, ARIMA Models, Model Validation, Make Prediction, Validate Model

M010. Time Series End To End Joker
Test Harness, Persistence, Data Analysis, ARIMA Models, Model Validation, Make Prediction, Validate Model

M011. Data Preparation
Fill Missing Values With Imputation, Select Features With RFE, Scale Data With Normalization, Transform Categories With One-Hot Encoding, Transform Numbers to Categories With kBins, Dimensionality Reduction With PCA,

M012. Gradient Boosting
Monitor Performance and Early Stopping, Feature Importance with XGBoost, XGBoost Hyperparameter Tuning,

About

Machine Learning Mastery - complete courses python files - combined with Kaggle Courses jupyter files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published