This package identifies outlier(s) for a given time-series dataset in simple steps. It supports day, week, month and quarter level time-series data.
pip install pycatcher
- PyCatcher expects a Pandas DataFrame as an input for various outlier detection methods. It can convert Spark DataFrame to Pandas DataFrame at the data processing stage.
- First column in the dataframe must be a time period column (date in 'YYYY-MM-DD'; month in 'YYYY-MM'; year in 'YYYY' format) and the last column a numeric column (sum or total count for the time period) to detect outliers using Seasonal Decomposition algorithms.
- Last column must be a numeric column to detect outliers using Interquartile Range (IQR) and Moving Average algorithms.
- At present, PyCatcher does not depend on labeled observations (ground truth). Outliers are detected solely through underlying algorithms (for example, seasonal-trend decomposition and dispersion methods like MAD or Z-Score).
PyCatcher provides an efficient solution for detecting anomalies in time-series data using various statistical methods. Below are the available techniques for anomaly detection, each optimized for different data characteristics.
Seasonal decomposition algorithms (Classical; STL; MSTL) requires at least 2 years of data, otherwise we can use simpler methods (Inter Quartile Range (IQR); Moving Average method) to detect outliers.
For datasets with at least two years of data, PyCatcher automatically determines whether the data follows an additive or multiplicative model to detect anomalies.
- Method:
detect_outliers_classic(df)
- Output: DataFrame of detected anomalies or a message indicating no anomalies.
Quickly identify if there are any anomalies specifically for the current date.
- Method:
detect_outliers_today_classic(df)
- Output: Anomaly details for today or a message indicating no outliers.
Retrieve the most recent anomalies identified in your time-series data.
- Method:
detect_outliers_latest_classic(df)
- Output: Details of the latest detected anomalies.
Show outliers in your data through classical seasonal decomposition.
- Method:
build_outliers_plot_classic(df)
- Output: Outlier plot generated using classical seasonal decomposition.
Understand seasonality in your data by visualizing classical seasonal decomposition.
- Method:
build_seasonal_plot_classic(df)
- Output: Seasonal plots displaying additive or multiplicative trends.
Show month-wise box plot
- Method:
build_monthwise_plot(df)
- Output: Month-wise box plots showing spread and skewness of data.
Use the Seasonal-Trend Decomposition method (STL) to detect anomalies.
- Method:
detect_outliers_stl(df)
- Output: Rows flagged as outliers using STL.
Quickly identify if there are any anomalies specifically for the current date.
- Method:
detect_outliers_today_stl(df)
- Output: Anomaly details for today or a message indicating no outliers.
Retrieve the most recent anomalies identified in your time-series data.
- Method:
detect_outliers_latest_stl(df)
- Output: Details of the latest detected anomalies.
Show outliers using the Seasonal-Trend Decomposition using LOESS (STL).
- Method:
build_outliers_plot_stl(df)
- Output: Outlier plot generated using STL.
Understand seasonality in your data by visualizing Seasonal-Trend Decomposition using LOESS (STL).
- Method:
build_seasonal_plot_stl(df)
- Output: Seasonal plot to decompose a time series into a trend component, seasonal components, and a residual component.
Use the Multiple Seasonal-Trend Decomposition method (MSTL) to detect anomalies.
- Method:
detect_outliers_mstl(df)
- Output: Rows flagged as outliers using MSTL.
Quickly identify if there are any anomalies specifically for the current date.
- Method:
detect_outliers_today_mstl(df)
- Output: Anomaly details for today or a message indicating no outliers.
Retrieve the most recent anomalies identified in your time-series data.
- Method:
detect_outliers_latest_mstl(df)
- Output: Details of the latest detected anomalies.
Show outliers using the Multiple Seasonal-Trend Decomposition using LOESS (MSTL).
- Method:
build_outliers_plot_mstl(df)
- Output: Outlier plot generated using MSTL.
Understand seasonality in your data by visualizing Multiple Seasonal-Trend Decomposition using LOESS (MSTL).
- Method:
build_seasonal_plot_mstl(df)
- Output: Seasonal plot to decompose a time series into a trend component, multiple seasonal components, and a residual component.
Detect anomalies in time-series data using the ESD algorithm.
- Method:
detect_outliers_esd(df)
- Output: Rows flagged as outliers using the Generalized ESD or Seasonal ESD algorithm.
Show outliers using the Generalized ESD or Seasonal ESD algorithm.
- Method:
build_outliers_plot_esd(df)
- Output: Outlier plot generated using Generalized ESD or Seasonal ESD algorithm.
Detect anomalies in time-series data using the Moving Average method.
- Method:
detect_outliers_moving_average(df)
- Output: Rows flagged as outliers using Moving Average and Z-score algorithm.
Show outliers using the Moving Average and Z-score algorithm.
- Method:
build_outliers_plot_moving_average(df)
- Output: Outlier plot generated using Moving Average method.
For datasets spanning less than two years, the IQR method is employed.
- Method:
find_outliers_iqr(df)
- Output: Rows flagged as outliers based on IQR.
Build an IQR plot for a given dataframe (for less than 2 years of data).
- Method:
build_iqr_plot(df)
- Output: IQR plot for the time-series data.
To see an example of how to use the pycatcher
package for outlier detection in time-series data, check out the Example Notebook.
The notebook provides step-by-step guidance and demonstrates the key features of the library.