Skip to content

Classification Pipeline for Birth Outcome Prediction This repository contains a machine learning pipeline for birth outcome prediction using maternal and neonatal health datasets. The code focuses on preprocessing, model training, evaluation, and hyperparameter tuning for classification tasks.

Notifications You must be signed in to change notification settings

Samiah-Kanwar/ML_birthOutcome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

This repository contains a modular and scalable machine learning pipeline designed for classification tasks. The code is highly configurable, allowing it to be used with various datasets by simply updating file paths and column configurations. It supports preprocessing, training, evaluation, and visualization, making it a robust solution for data analysis and predictive modeling. Key Features

Dataset Imputation: Supports mean imputation for numerical columns. Implements LOCF (Last Observation Carried Forward) for categorical data. Data Preprocessing: Scales features using StandardScaler. Splits data into training and testing sets. Model Training and Evaluation: Includes Logistic Regression, SVM, and Random Forest models. Performs cross-validation and computes evaluation metrics (accuracy, ROC curves, etc.). Hyperparameter Tuning: Ready to integrate RandomizedSearchCV or GridSearchCV for optimization. Visualization: Generates ROC curves for model performance comparison. Plots feature importance for tree-based models. Flexible Configuration: Parameterized file paths, target column, and imputation strategies make the pipeline adaptable for any dataset. How to Use

Clone the Repository:

git clone 
cd your-repository

Set Up Your Environment: Install the required Python libraries:

pip install -r requirements.txt

Update Configuration:

Edit the script to include your dataset's file paths and column names: input_filepath for the input data file. output_filepath for saving the processed dataset. mean_impute_cols for numerical columns requiring mean imputation. locf_impute_cols for categorical columns requiring LOCF imputation. target_col for the name of the target variable.

Run the Script: Execute the script to preprocess your data, train models, and generate visualizations:

python main.py

Requirements

Python 3.7+
Required Libraries:
    pandas
    numpy
    scikit-learn
    matplotlib
    scipy

Install all dependencies via:

pip install -r requirements.txt

Folder Structure

├── data/
│   ├── input_data.csv   # Sample input dataset
│   ├── output_data.csv  # Processed dataset after imputation
├── main.py              # Main script for running the pipeline
├── requirements.txt     # Required libraries
└── README.md            # Project documentation

Applications

This pipeline is ideal for:

Healthcare data analysis. General predictive modeling. Rapid prototyping of classification tasks. Feature selection and model comparison.

About

Classification Pipeline for Birth Outcome Prediction This repository contains a machine learning pipeline for birth outcome prediction using maternal and neonatal health datasets. The code focuses on preprocessing, model training, evaluation, and hyperparameter tuning for classification tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages