Fraudulent_payments

Overview Aim of the project was to build a number of models and select the best performing one for the job. Managed to build a model with an AUC of 0.99 and accuracy of 99.4% which was deemed a success. The primary stages of this script are as follows:

Data Cleaning and Pre-processing: The script starts by cleaning and pre-processing the dataset to ensure data quality and consistency.

Feature Selection: Feature selection is performed using the Mutual Information method. This step aims to identify the most relevant features for the classification task.

Model Comparison: The script evaluates and compares the performance of different classification models. The following models are included:

Logistic Regression
Random Forest
XGBoost
Artificial Neural Network (ANN)

Details Data Cleaning and Pre-processing The initial phase involves data cleaning and pre-processing to handle missing values, outliers, and ensure data consistency. It's essential to have a clean dataset as a foundation for accurate model building.

Feature Selection with Mutual Information Mutual Information is employed to assess the importance of each feature concerning the target variable. Features with higher mutual information are considered more relevant and are retained, while less informative features are discarded.

Model Comparison The script proceeds to build and evaluate four distinct classification models, each with its strengths and characteristics. These models are benchmarked and compared using Area Under Curve (AUC) to determine which one performs best for the specific classification task.

Data used: https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022/

References:

Banking error rate - https://assets.teradata.com/resourceCenter/downloads/CaseStudies/CaseStudy_EB9821_Danske_Bank_Fights_Fraud.pdf 2.Fraud Classification Priniciples Fraud Detection Methods -https://journalofbigdata.springeropen.com/articles/10.1186/s40537-022-00573-8 Fraud Detection Methods -https://www.kaggle.com/code/juanjosmorenogiraldo/bank-fraud-detection-using-gbm#3-%7C-Data-Preprocessing Fraud Detection Methods - https://trenton3983.github.io/files/projects/2019-07-19_fraud_detection_python/2019-07-19_fraud_detection_python.html
Feature Selection Information Gain Method - https://jovian.com/poduguvenu/feature-selection-using-information-gain
Modelling Methods Random Forest - https://www.kaggle.com/code/hassanamin/credit-card-fraud-detection-using-random-forest#Using-Scikit-learn-to-split-data-into-training-and-testing-sets XGBoost and feature selection - https://domino.ai/blog/credit-card-fraud-detection-using-xgboost-smote-and-threshold-moving ANN Node Optimization - https://www.analyticsvidhya.com/blog/2021/09/a-comprehensive-guide-on-neural-networks-performance-optimization/

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
ANN Outputs		ANN Outputs
Images		Images
Full_Code.ipynb		Full_Code.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraudulent_payments

About

Releases

Packages

Languages

rcwylie/Fraudulent_payments

Folders and files

Latest commit

History

Repository files navigation

Fraudulent_payments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages