Skip to content

Azad77/Machine-Learning

 
 

Repository files navigation

Machine-Learning

Explorartory Data Analysis

The following dataset "Countries of the World" by Fernando Lasso has been analyzed. The main focus of this project is GDP (Gross Domestic Product), factors that affects GDP per capita and on the basis of the effects trying to create a model , which uses the data of 227 countries from the given dataset. Also in the following project there is a brief explanation of how total GDPs is related with all the factors. The key methods used for analysis of data is Correlation and Linear Regression. Our key findings leads us to know that GDP per capita is highly correlated with the factors such Literacy, Phones, Service, Infant mortality, Birthrate and Agriculture. This project is a good practice for EDA and visualization. Exploratory Data Analysis (EDA) is the first step in your data analysis process.we take a broad look at patterns, trends, outliers, unexpected results and so on in the dataset, using visual and quantitative methods to get a sense of the story it tells.

Air Quality Index - Analysis

▪The main focus of this particular kernel is AQI(Air Quality Index), and factors that affects AQI i.e (so2, no2, spm, rspm, pm2_5).
▪Following dataset has the concentration of pollutants and we need each pollutants index for calculating the air quality index , so that is been calculated further in the process and has been utilised in analysis .
▪Brief explanation of how combination of the independent variables (Interaction effect) has what impact on dependent variable and how is the accuracy of the model has been changed because of the same and how interdependence/ correlation (Multicollinearity) between various independent variable has adverse effect on the dependent varaiable and given data model.
▪The solution to the problems of multicollinearity is also been discussed in the following kernel i.e Regularization and Stepwise Regression. Both of which gives us an enhanced model , with better predictors and estimators in alignment with dependent variable.
▪Consists of EDA(Explorartory Data Analysis) which is usually the first step in your data analysis process. We take a broader look at patterns, trends, outliers, unexpected results and so on in the dataset, using visual and quantitative methods to get a sense of the story it tells.

Kaggle Competition - Mercedes Benz Greener Manufacturing

In this competition, Daimler is challenging Kagglers to tackle the curse of dimensionality and reduce the time that cars spend on the test bench. Competitors will work with a dataset representing different permutations of Mercedes-Benz car features to predict the time it takes to pass testing. Winning algorithms will contribute to speedier testing, resulting in lower carbon dioxide emissions without reducing Daimler’s standards.
Most part of the code was available in kernels present for the respective competitions, and from that an idea what has to be done becomes clear. XGBoost has been implemented on the dataset by modelling data in different ways, then PCA, ICA and truncated SVD has been applied and data is again modelled. The unique thing that i did with my project is I used H2O to calculate the metrics for the given dataset , for that the raw dataset with useful features is used. Then finally stacking algorithm is used.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%