Machine Learning Project: Coffee Data Analysis

Overview

This project implements various machine learning algorithms from scratch and compares them with Scikit-learn implementations. The goal is to analyze and classify coffee data (Arabica vs Robusta) based on physical attributes. The project includes tasks such as similarity analysis, feature extraction, clustering, and classification.

Project Structure

ClassSimilarity/
- Contains scripts and notebooks for calculating intra-class and inter-class similarity metrics (e.g., Euclidean, Manhattan, Cosine).
- Outputs include detailed metrics for analyzing the separability of Arabica and Robusta classes.
data/
- Contains data synthesizing codes required for completing the dataset.
- Already done, you do not need to run them.
featureExtraction/
- Scripts for extracting relevant features (e.g., width, height, depth, weight) from the dataset.
- Implements feature selection techniques to improve model performance.
kMeansClass/
- Contains the implementation of K-means clustering from scratch.
- Includes visualizations (PCA-reduced plots) of clustering results and analysis of cluster quality.
KNN/
- Scripts for implementing the k-Nearest Neighbors (KNN) algorithm from scratch.
- Includes performance evaluation and comparison with Scikit-learn’s KNN implementation.
LogisticRegression/
- Implementation of logistic regression from scratch, with support for L2 regularization.
- Tasks include:
  - Monitoring training for overfitting.
  - Comparison of runtime and accuracy with Scikit-learn’s LogisticRegression.
randomForest/
- Implementation of a Random Forest classifier.
- Includes performance evaluation.
supportVector/
- Scripts for implementing Support Vector Machines (SVM) from scratch.
- Comparison of performance with Scikit-learn’s SVM implementation.
svm_vs_logistic/
- Side-by-side comparison of SVM and logistic regression models.
- Evaluates accuracy, runtime, and decision boundaries for both models.
analysis/

Comparison of all trained models with metrics accuracy, precision, recall, F1 score, AUROC, and run time.

coffeeDataSynthesized.xlsx
- The main dataset containing attributes for Arabica and Robusta coffee samples.
- Attributes include width, height, depth, weight, country, origin, altitude, variety, process, flavor, acidity,and type (target variable).

How to Run

Set up Environment:
- Ensure Python 3.x is installed.
- Install the required libraries: pip install numpy pandas matplotlib scikit-learn xgboost scipy seaborn cvxopt
Explore Specific Tasks:
- Navigate to the folder corresponding to the task (e.g., LogisticRegression/).
- Run the Jupyter notebooks inside the folder.
- Results (metrics, plots) will be saved in the corresponding files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Project: Coffee Data Analysis

Overview

Project Structure

How to Run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ClassSimilarity		ClassSimilarity
KNN		KNN
LogisticRegression		LogisticRegression
analysis		analysis
data		data
featureExtraction		featureExtraction
kMeansClass		kMeansClass
randomForest		randomForest
supportVector		supportVector
svm_vs_logistic		svm_vs_logistic
ML Project Report.pdf		ML Project Report.pdf
README.md		README.md
coffeeDataSynthesized.xlsx		coffeeDataSynthesized.xlsx

gulsensabak/Machine-Learning-Algorithms-from-Scratch

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Project: Coffee Data Analysis

Overview

Project Structure

How to Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages