Skip to content

Project for Machine Learning for Networking Exam @ Polito - SSH Shell Attacks Analysis: a project to classify attacker tactics and identify patterns in 230,000 honeypot-captured Unix shell attacks using MITRE ATT&CK framework and ML techniques.

License

Notifications You must be signed in to change notification settings

ML4Net/SSH-Shell-Attacks

Repository files navigation

SSH Shell Attacks

polito

Table of Contents

 • Overview
 • Dataset
 • Project Report
 • Project Structure
 • Tools and Technologies
 • How to Run the Project
 • Detailed Documentation
     • Data Directory
     • Notebooks Directory
     • Results Directory
     • Scripts Directory
     • Tests Directory
 • Authors
 • License
 • Acknowledgments

Last updated: January 2025

Overview

This project is part of the Machine Learning for Networking course at Politecnico di Torino. It focuses on analyzing SSH shell attack sessions recorded from honeypot deployments to classify attacker intents and explore underlying patterns.

Navigation Tip: This README provides a general overview of the project. For detailed documentation, check the specific README files in each directory (see Table of Contents above). Each subdirectory contains in-depth information about its specific components.

Quick Links:

Objectives

  1. Classification: Automatically identify and assign attacker intents (e.g., Persistence, Discovery) to each SSH attack session.
  2. Clustering: Group similar attack sessions to uncover attack patterns and fine-grained categories.
  3. Language Models: Explore advanced NLP techniques like BERT and Doc2Vec for improved classification performance.

Dataset

The dataset consists of approximately 230,000 Unix shell attack sessions recorded from honeypots. It includes:

  • Session Commands: Malicious commands executed in an SSH session.
  • Timestamps: The exact time each attack started.
  • Labels: Pre-assigned intents based on the MITRE ATT&CK framework.

Intents (Classes)

The dataset uses 7 main intent classes:

  1. Persistence
  2. Discovery
  3. Defense Evasion
  4. Execution
  5. Impact
  6. Other (Miscellaneous intents)
  7. Harmless (Non-malicious commands)

Project Report

The project report is a comprehensive document detailing the methodologies, experiments, and findings of the SSH Shell Attacks project.

  • Format: PDF
  • Template: ACM format single column (acmlarge)

The report is named SSH-Shell-Attacks-report.pdf and can be found in the root directory of the repository.

There is also an appendix of the project that contains extra plots and additional information. The appendix is also in the root directory, in PDF format, and uses the same ACM format single column template. The appendix is named SSH-Shell-Attacks-appendix.pdf.

The original source code of the report can be found in the repo latex-report.

Project Structure

SSH-Shell-Attacks/
│
├── data/                           # Dataset and related resources
│   ├── raw/                        # Original dataset files (e.g., ssh_attacks.parquet)
│   └── processed/                  # Pre-processed and feature-engineered files
│
├── notebooks/                      # Jupyter notebooks
│
├── scripts/                        # Python scripts for algorithms and utilities
│
├── results/                        # Outputs from the models and analysis
│   ├── figures/                    # Plots and visualizations
│   ├── models/                     # Saved models (e.g., .pkl, .h5)
│   └── metrics/                    # Evaluation metrics and reports
│
├── README.md                       # High-level overview of the project
├── SSH-Shell-Attacks-report.pdf    # Report of the project
├── SSH-Shell-Attacks-appendix.pdf  # Appendix of the report
├── requirements.txt                # Python dependencies
├── .gitignore                      # Ignore unnecessary files for versioning
└── LICENSE                         # Licensing information (optional)

Tools and Technologies

  • Programming Language: Python
  • Libraries:
    • Data Processing: pandas, numpy, pyarrow
    • Visualization: matplotlib, seaborn
    • Machine Learning: scikit-learn
    • Clustering: scikit-learn, wordcloud
    • Language Models: scikit-learn, transformers, torch

How to Run the Project

  1. Clone the Repository:

    git clone https://github.com/ML4Net/SSH-Shell-Attacks.git
    cd SSH-Shell-Attacks
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Execute the Notebooks: Open the relevant notebook for each section and follow the instructions to:

    • Load the dataset.
    • Perform data exploration.
    • Train and evaluate machine learning models.

    Notebooks:

    • section0_data_preprocessing_and_cleaning.ipynb
    • section1_data_exploration_and_preprocessing.ipynb
    • section2_supervised_learning_classification.ipynb
    • section3_unsupervised_learning_clustering.ipynb
    • section4_language_model_exploration.ipynb
  4. Explore Scripts: Run modular scripts in the scripts/ directory for specific tasks like preprocessing or model training.


Authors

Name GitHub LinkedIn Email
Andrea Botticella GitHub LinkedIn Email
Elia Innocenti GitHub LinkedIn Email
Renato Mignone GitHub LinkedIn Email
Simone Romano GitHub LinkedIn Email

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Luca Vassio ([email protected]): the professor supervising our work and the primary point of reference for the project.
  • Matteo Boffa ([email protected]): the creator and organizer of this project.
  • Team Members: Andrea Botticella, Elia Innocenti, Renato Mignone, and Simone Romano.

Please cite us if this project is copied, used for inspiration, or if any material is taken from it.

About

Project for Machine Learning for Networking Exam @ Polito - SSH Shell Attacks Analysis: a project to classify attacker tactics and identify patterns in 230,000 honeypot-captured Unix shell attacks using MITRE ATT&CK framework and ML techniques.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •