Skip to content

asparmar14/Classification-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Wine Quality Classification

This repository contains a machine learning project for classifying wine quality based on various physicochemical properties. The dataset used is from the University of California, Irvine, and can be found on Kaggle.

Dataset

The dataset is sourced from Kaggle:

The dataset includes features such as acidity, alcohol content, and more, with a target variable indicating wine quality.

Project Overview

The project involves several key steps:

  1. Data Exploration: Visualize and understand the relationships between features.

    • Use sns.pairplot to visualize feature relationships and determine which features might be informative for classification.
  2. Train-Test Split: Divide the dataset into training and testing sets to evaluate model performance.

  3. Scaling: Standardize the dataset using StandardScaler to ensure all features are on a similar scale.

  4. Modeling: Apply various classifiers including K-Nearest Neighbors (KNN) and Random Forest.

    • Use KNeighborsClassifier for KNN classification.
    • Use RandomForestClassifier and perform hyperparameter optimization with GridSearchCV.
  5. Hyperparameter Optimization: Optimize model parameters using GridSearchCV to find the best combination of hyperparameters.

    • Test different values for n_estimators, max_features, and bootstrap parameters for the Random Forest model.
  6. Multi-Class Classification: Classify wine into categories: "Bad", "Normal", and "Good".

    • Create a mapping from quality scores to these categories.
  7. Multi-Label Classification: Create a multi-label classifier for alcohol content and wine quality.

    • Use binary classification for each label and compute confusion matrices for evaluation.
  8. Evaluation: Use various metrics to evaluate model performance:

    • Accuracy Metrics: Compute accuracy, recall, precision, and confusion matrices for classification models.
    • ROC Curve: Plot ROC curves to evaluate binary classification performance.

Files

  • data_import.ipynb: Jupyter notebook for data exploration, preprocessing, and modeling.
  • requirements.txt: List of Python dependencies required to run the project.
  • README.md: Documentation for the project.

Usage

  1. Clone the repository:
    git clone https://github.com/asparmar14/Classification-task
    
  2. Navigate to the project directory:
    cd wine-quality-classification
  3. Run the Jupyter notebook:
    jupyter notebook data_import.ipynb
    

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published