This project focuses on the detection and analysis of anomalies in diesel train data, with the aim of ensuring the safe and reliable operation of railway systems. The project utilizes data mining and machine learning techniques to identify and categorize anomalies, providing insights that can help prevent train incidents.
- Overview
- Data Description
- Features
- Project Structure
- Getting Started
- Data Preprocessing
- Anomaly Detection
- Dashboard
- Contributing
- License
The National Railway Company of Belgium (SNCB) is responsible for operating the rail service in Belgium. This project is sponsored by the rolling stock team of SNCB and aims to analyze real-life time-series data from diesel trains to detect and categorize anomalies. The data includes information on temperatures, pressures, RPMs, and GPS locations.
The dataset contains real-life errors and inconsistencies, including GPS positions reporting zeros and temperature values at exactly zero. The data spans from January 2023 to September 2023, allowing the exploration of the impact of weather conditions on train operations.
- Anomaly detection using various data mining methods.
- Enrichment of data with external weather data.
- Development of a comprehensive anomaly dashboard.
- Support for deployment in streaming mode.
The project is organized into the following sections:
- Data Loading and Preprocessing: The raw data is loaded and preprocessed to make it suitable for analysis.
- Enrichment with Weather Data: External weather data is integrated to better understand the impact of weather conditions on anomalies.
- Anomaly Detection Methods: Multiple anomaly detection methods are implemented and compared.
- Anomaly Dashboard: A dashboard is developed to visualize anomalies and their relationship with weather, time, and location.
To get started with the project, follow these steps:
- Clone the repository to your local machine.
- Install the required libraries and dependencies.
- Preprocess the data using the provided code.
- Run the anomaly detection methods.
- Explore the anomaly dashboard for insights.
Data preprocessing includes data type conversion, handling missing values, and cleaning the dataset to prepare it for analysis. This step ensures the data is in a usable format.
The project employs various anomaly detection methods to identify and categorize anomalies. These methods are compared to determine their effectiveness in capturing anomalies in the train data.
The dashboard provides a visual representation of anomalies, helping the SNCB rolling stock team understand patterns related to weather, time, and location. It enhances anomaly interpretation and decision-making.
We welcome contributions to this project. If you'd like to contribute, please follow our contribution guidelines.
This project is licensed under the MIT License. Feel free to use, modify, and distribute the code for your own projects.
Disclaimer: This project is for educational and research purposes. It is not intended for production use in railway systems.