Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bank credit analysis #699

Merged
merged 4 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11,163 changes: 11,163 additions & 0 deletions Bank Credit Analysis/Dataset/bank.csv

Large diffs are not rendered by default.

Binary file added Bank Credit Analysis/Images/__results___10_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___11_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___12_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___13_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___13_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___14_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___15_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___24_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___25_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___27_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___28_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___29_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___30_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Bank Credit Analysis/Images/__results___31_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions Bank Credit Analysis/Model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## 🚀 Models Implemented
- **Random Forest**: Chosen for its robustness and ability to handle large datasets with higher accuracy.
- **XGBoost**: Known for its performance and speed, making it suitable for complex datasets.
- **Decision Tree**: Simple to interpret and visualize, though prone to overfitting.
- **AdaBoost**: Effective in boosting the performance of weak classifiers.
- **CatBoost**: Handles categorical features well and provides high accuracy.
- **Logistic Regression**: Baseline model for classification tasks.
- **Extra Trees**: Similar to Random Forest but with some differences in the splitting of nodes.
- **Gaussian Naive Bayes**: Simple and effective, especially for smaller datasets.
- **K-Nearest Neighbors**: Simple and easy to implement, but can be computationally expensive.
- **Support Vector Machine**: Effective in high-dimensional spaces and suitable for classification tasks.

## 📈 Performance of the Models based on the Accuracy Scores
| Model | Train Accuracy | CV Mean Accuracy | Test Accuracy |
|-------------------------|----------------|------------------|---------------|
| K Nearest Neighbors | 81.81% | 75.38% | 75.19% |
| Support Vector Machine | 83.37% | 82.92% | 81.59% |
| Random Forest | 99.40% | 85.79% | 83.70% |
| XGBoost | 100.00% | 85.47% | 84.42% |
| Decision Tree | 87.51% | 81.92% | 80.25% |
| AdaBoost | 84.04% | 82.91% | 82.58% |
| CatBoost | 90.36% | 86.58% | 85.89% |
| Logistic Regression | 82.55% | 82.10% | 81.68% |
| Extra Trees | 98.76% | 83.38% | 82.22% |
| Gaussian Naive Bayes | 73.92% | 73.58% | 74.56% |

## ✒️ Your Signature
Aditya D

GitHub: [https://www.github.com/adi271001](https://www.github.com/adi271001)
LinkedIn: [https://www.linkedin.com/in/aditya-d-23453a179/](https://www.linkedin.com/in/aditya-d-23453a179/)
Topmate: [https://topmate.io/aditya_d/](https://topmate.io/aditya_d/)
Twitter: [https://x.com/ADITYAD29257528](https://x.com/ADITYAD29257528)
1 change: 1 addition & 0 deletions Bank Credit Analysis/Model/bank-credit-analysis.ipynb

Large diffs are not rendered by default.

80 changes: 80 additions & 0 deletions Bank Credit Analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Bank Credit Analysis

## 🎯 Goal
The main goal of this project is to develop machine learning models to accurately predict the likelihood of a customer subscribing to a term deposit based on their banking information and demographic details.

## 🧵 Dataset
The dataset for this project is sourced fromm [Kaggle's Bank Marketing Dataset](https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset/data).

## 🧾 Description
This project involves analyzing various features of bank customers and building machine learning models to predict whether a customer will subscribe to a term deposit. The project includes data preprocessing, exploratory data analysis (EDA), model development, and evaluation to find the most accurate predictive model.

## 🧮 What I had done!
1. **Data Collection and Preprocessing**:
- Collected the dataset from Kaggle.
- Preprocessed the data to handle missing values, encoded categorical variables, and split the dataset into training and testing sets.

2. **Exploratory Data Analysis (EDA)**:
- Performed EDA to understand the distribution of data and identify any patterns or anomalies.
- ![pair plot 1](https://github.com/adi271001/ML-Crate/blob/Bank-Credit-Analysis/Bank%20Credit%20Analysis/Images/__results___11_1.png?raw=true)
- ![distribution graph](https://github.com/adi271001/ML-Crate/blob/Bank-Credit-Analysis/Bank%20Credit%20Analysis/Images/__results___13_0.png?raw=true)
- ![boxplot](https://github.com/adi271001/ML-Crate/blob/Bank-Credit-Analysis/Bank%20Credit%20Analysis/Images/__results___15_0.png?raw=true)
- ![waveplot](https://github.com/adi271001/ML-Crate/assets/67856422/f6e50edc-6cc9-475b-b3bb-82869b1cba8f)
- ![bar plot](https://github.com/adi271001/ML-Crate/assets/67856422/55cebd86-4eec-4829-85d1-091f0ebfbc3d)

3. **Model Development**:
- Implemented several machine learning models including Random Forest, XGBoost, Decision Tree, AdaBoost, CatBoost, Logistic Regression, Extra Trees, Gaussian Naive Bayes, K-Nearest Neighbors, and Support Vector Machine.
- Used grid search for hyperparameter tuning and nested cross-validation to evaluate model performance.

4. **Model Evaluation**:
- Evaluated the models based on accuracy scores on the training and testing datasets.

5. **Conclusion**:
- Identified the best-performing model based on accuracy scores.

## 🚀 Models Implemented
- **Random Forest**: Chosen for its robustness and ability to handle large datasets with higher accuracy.
- **XGBoost**: Known for its performance and speed, making it suitable for complex datasets.
- **Decision Tree**: Simple to interpret and visualize, though prone to overfitting.
- **AdaBoost**: Effective in boosting the performance of weak classifiers.
- **CatBoost**: Handles categorical features well and provides high accuracy.
- **Logistic Regression**: Baseline model for classification tasks.
- **Extra Trees**: Similar to Random Forest but with some differences in the splitting of nodes.
- **Gaussian Naive Bayes**: Simple and effective, especially for smaller datasets.
- **K-Nearest Neighbors**: Simple and easy to implement, but can be computationally expensive.
- **Support Vector Machine**: Effective in high-dimensional spaces and suitable for classification tasks.

## 📚 Libraries Needed
- pandas
- numpy
- scikit-learn
- xgboost
- catboost

## 📊 Exploratory Data Analysis Results
*Include images of visualizations here*

## 📈 Performance of the Models based on the Accuracy Scores
| Model | Train Accuracy | CV Mean Accuracy | Test Accuracy |
|-------------------------|----------------|------------------|---------------|
| K Nearest Neighbors | 81.81% | 75.38% | 75.19% |
| Support Vector Machine | 83.37% | 82.92% | 81.59% |
| Random Forest | 99.40% | 85.79% | 83.70% |
| XGBoost | 100.00% | 85.47% | 84.42% |
| Decision Tree | 87.51% | 81.92% | 80.25% |
| AdaBoost | 84.04% | 82.91% | 82.58% |
| CatBoost | 90.36% | 86.58% | 85.89% |
| Logistic Regression | 82.55% | 82.10% | 81.68% |
| Extra Trees | 98.76% | 83.38% | 82.22% |
| Gaussian Naive Bayes | 73.92% | 73.58% | 74.56% |

## 📢 Conclusion
The best-performing model in this project is CatBoost with a CV Mean Accuracy of 86.58% and Test Accuracy of 85.89%. This model provides a good balance between training and generalization performance, making it the most suitable for predicting customer subscription to a term deposit.

## ✒️ Your Signature
Aditya D

GitHub: [https://www.github.com/adi271001](https://www.github.com/adi271001)
LinkedIn: [https://www.linkedin.com/in/aditya-d-23453a179/](https://www.linkedin.com/in/aditya-d-23453a179/)
Topmate: [https://topmate.io/aditya_d/](https://topmate.io/aditya_d/)
Twitter: [https://x.com/ADITYAD29257528](https://x.com/ADITYAD29257528)
11 changes: 11 additions & 0 deletions Bank Credit Analysis/Results/models_results.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Model,Train Accuracy,CV Mean Accuracy,Test Accuracy
K Nearest Neighbors,81.812073020495,75.38349628765279,75.19032691446485
Support Vector Machine,83.3687982976817,82.92084403750302,81.59426780116435
Random Forest,99.3952290290066,85.78782375212123,83.69905956112854
XG Boost,100.0,85.47421745853995,84.4155844155844
Decision Tree,87.51259939522903,81.92401529480773,80.25078369905955
AdaBoost,84.04076604322992,82.90964582921634,82.5794894760412
CatBoost,90.35726285138314,86.58297809605365,85.8934169278997
Logistic Regression,82.55123754059805,82.10325563596099,81.68383340797133
Extra Trees,98.7568596707358,83.37999567128082,82.22122704881325
Gaussian Naive Bayes,73.9164520103035,73.58034008676259,74.56336766681594
10 changes: 10 additions & 0 deletions Bank Credit Analysis/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
numpy==1.24.3
pandas==2.0.3
matplotlib==3.7.2
seaborn==0.12.2
scikit-learn==1.2.2
xgboost==1.7.6
catboost==1.1
pdpbox==0.3.0
shap==0.42.1
yellowbrick==1.5
Loading