This repository contains a machine learning project for classifying wine quality based on various physicochemical properties. The dataset used is from the University of California, Irvine, and can be found on Kaggle.
The dataset is sourced from Kaggle:
The dataset includes features such as acidity, alcohol content, and more, with a target variable indicating wine quality.
The project involves several key steps:
-
Data Exploration: Visualize and understand the relationships between features.
- Use
sns.pairplot
to visualize feature relationships and determine which features might be informative for classification.
- Use
-
Train-Test Split: Divide the dataset into training and testing sets to evaluate model performance.
-
Scaling: Standardize the dataset using
StandardScaler
to ensure all features are on a similar scale. -
Modeling: Apply various classifiers including K-Nearest Neighbors (KNN) and Random Forest.
- Use
KNeighborsClassifier
for KNN classification. - Use
RandomForestClassifier
and perform hyperparameter optimization withGridSearchCV
.
- Use
-
Hyperparameter Optimization: Optimize model parameters using
GridSearchCV
to find the best combination of hyperparameters.- Test different values for
n_estimators
,max_features
, andbootstrap
parameters for the Random Forest model.
- Test different values for
-
Multi-Class Classification: Classify wine into categories: "Bad", "Normal", and "Good".
- Create a mapping from quality scores to these categories.
-
Multi-Label Classification: Create a multi-label classifier for alcohol content and wine quality.
- Use binary classification for each label and compute confusion matrices for evaluation.
-
Evaluation: Use various metrics to evaluate model performance:
- Accuracy Metrics: Compute accuracy, recall, precision, and confusion matrices for classification models.
- ROC Curve: Plot ROC curves to evaluate binary classification performance.
data_import.ipynb
: Jupyter notebook for data exploration, preprocessing, and modeling.requirements.txt
: List of Python dependencies required to run the project.README.md
: Documentation for the project.
- Clone the repository:
git clone https://github.com/asparmar14/Classification-task
- Navigate to the project directory:
cd wine-quality-classification
- Run the Jupyter notebook:
jupyter notebook data_import.ipynb
This project is licensed under the MIT License. See the LICENSE file for details.
- UCI Machine Learning Repository
- Kaggle
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron