-
Murat Can Önder – 119200089
-
Uğur Çelik – 119200045
This project focuses on utilizing data mining techniques for cybersecurity purposes. The objective is to analyze network traffic data, specifically distinguishing between normal and attack traffic using machine learning models.
The project utilizes two datasets:
CTU13_Normal_Traffic.csv
: Dataset containing normal network traffic.CTU13_Attack_Traffic.csv
: Dataset containing attack network traffic.
The main script (data_mining_cybersecurity.py
) performs the following tasks:
-
Data Loading and Preprocessing:
- Loads the attack and normal traffic datasets.
- Handles missing values by dropping rows with missing data.
- Selects relevant features for analysis.
-
Model Training and Evaluation:
- Splits the data into training and testing sets.
- Utilizes logistic regression for classification.
- Evaluates model performance using confusion matrix, classification report, and accuracy score.
- Generates histograms and box plots to visualize feature distributions and their relationship with labels.
- Calculates summary statistics and correlation matrix to understand data characteristics.
-
Model Performance:
- Trains a logistic regression model and evaluates its performance on both training and test sets.
- Displays training and test accuracies.
-
Data Visualization:
- Utilizes pair plots to visualize relationships between features and labels.
-
Clone the Repository: git clone https://github.com/Onder-MuratCan/Midterm_MTH410
-
Install Dependencies: pip install -r requirements.txt
-
Run the Script: python data_mining_cybersecurity.py
- Python 3.x
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
Contributions are welcome! Feel free to submit issues or pull requests.