Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add historical data workbench in anomaly detection #214

Closed
sean-zheng-amazon opened this issue Jun 10, 2020 · 2 comments · Fixed by #359
Closed

Add historical data workbench in anomaly detection #214

sean-zheng-amazon opened this issue Jun 10, 2020 · 2 comments · Fixed by #359
Labels
AnomalyDetection Item related to Anomaly Detection and AD Kibana plugin feature new feature

Comments

@sean-zheng-amazon
Copy link
Contributor

Currently the only way users can look into historical data before AD is created is to use the preview function during the middle of AD creation. However the preview function is quite limited and display only sampled data. We want to create a dedicated workbench to let users play around with historical data, and apply models to see results.

@sean-zheng-amazon sean-zheng-amazon added AnomalyDetection Item related to Anomaly Detection and AD Kibana plugin feature new feature labels Jun 10, 2020
@anirudha anirudha changed the title Add Historical Data workbench in AD Add historical data workbench in anomaly detection Jul 8, 2020
@ylwu-amzn
Copy link
Contributor

ylwu-amzn commented Jul 10, 2020

[RFC] Anomaly detection on historical data

This RFC is to discuss how to enhance AD plugin to support detecting anomaly on historical data. So user can evaluate model with historical data and tune it. User can also detect anomaly in historical data to find opportunity to enhance business strategy/process, reduce cost etc.

Problem statement

We plan to support these use cases.

Case1. train model with historical data

User’s business has seasonal trends and they want to feed historical data into model. So the model will learn the seasonality before detecting streaming data. Otherwise, the model need to wait for weeks to learn the weekly pattern.

Case2. verify model with historical data

User can’t foresee detector will work well with future data. They should be able to verify the model by detecting anomalies in historical data. Currently, the only way user feed historical data into detectors is previewing anomaly result. The result is based on sampling data points, so we can't get anomaly result of all data points. It's not so solid to evaluate model's performance just on partial data. For example, you may find no anomalies in preview result as the sampling logic skipped the real anomalous data points. Preview function doesn't store anomaly result and rerun may generate different results as sampled data points may change totally with small time range difference.

User can create multiple detectors to try different configurations. By reviewing AD results of various detector configurations, user can learn how model performs differently. This helps them build confidence to use AD and learn how to use AD effectively.

Case3. evaluate model

Show user recall, precision and F1 score, so user can know exactly the model performance and tune it. That requires labeled data. Two ways to get labeled data:
1.User provide labeled data.
2.Provide a way for user to label data. We can allow user to label anomalies on chart directly

Note: Don’t overfit the model.

Case4. detect historical anomalies

User can detect anomalies in historical data to help analyze and enhance business strategy and process. For example, find anomalies in credit card transaction can help bank identify fraud. By analyzing anomalies of delivery time, online store can tune their process to make delivery time more accurate.

Proposed solution

Currently user can configure detector, preview anomalies on sampled historical data. Start a detector will start a job in backend to detect streaming data. The job is hidden behind detector. User can only start/stop/configure detector job by start/stop/configure detector. To support anomaly detection on historical data, we should show user more knobs to control how to run the job. We come up with “detection task” which represents a detector run. Then user can create detection task running on historical data.

From schedule perspective, a detection task can be one-time task or periodic task. From payload perspective, a detection task can be batch task(feed multiple historical data points and generate multiple anomaly results) and realtime/non-batch task(feed one streaming data point and generate one anomaly result). Current detector job will be a periodic realtime/non-batch task. User can define a batch task to detect anomalies in historical data and pre-train model. Batch task is an async job at backend, and we will show its progress on AD Kibana. Based on the tasks’ result, user can verify/evaluate model, compare different models, analyze trending/distribution changes etc.

Currently we can't feed historical data into AD model. AD model needs to take in streaming data, train and predict at the same time. For example, user have historical data and know it has some weekly pattern, such as CPU will have a spike at 10:00PM every Saturday when they run some batch task. Model should not consider these spikes as anomalies. But currently users have no way to pre-train model to learn such weekly pattern. To solve this, one option is user can create detection task on historical data. If one task has good result, user can save it as reusable checkpoint. We save checkpoint for every task by default, but user need to explicitly define which one is reusable to make it easier to search/select on Kibana. If user choose one checkpoint to run task, will restore model from checkpoint and serve anomaly detection requests.

For preview function, current implementation detect anomalies on sampled data. So the preview result is not so accurate and may miss some anomalies if the anomalous data point not sampled. We can provide user option to use full historical data to preview the result which may take a longer time but generate more accurate result. If use full historical data, we will create a one-time batch task at the backend, and refresh preview Kibana page periodically to show detection progress and results.

@ylwu-amzn
Copy link
Contributor

Welcome any comments/suggestions. For example,

  1. What's your use case which not covered in the RFC?
  2. Do you have any suggestions for the solution or any new options?
  3. What feature/enhancement do you want most?

@ohltyler ohltyler linked a pull request Jan 22, 2021 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AnomalyDetection Item related to Anomaly Detection and AD Kibana plugin feature new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants