Add historical data workbench in anomaly detection #214

sean-zheng-amazon · 2020-06-10T00:21:35Z

Currently the only way users can look into historical data before AD is created is to use the preview function during the middle of AD creation. However the preview function is quite limited and display only sampled data. We want to create a dedicated workbench to let users play around with historical data, and apply models to see results.

ylwu-amzn · 2020-07-10T03:05:03Z

[RFC] Anomaly detection on historical data

This RFC is to discuss how to enhance AD plugin to support detecting anomaly on historical data. So user can evaluate model with historical data and tune it. User can also detect anomaly in historical data to find opportunity to enhance business strategy/process, reduce cost etc.

Problem statement

We plan to support these use cases.

Case1. train model with historical data

User’s business has seasonal trends and they want to feed historical data into model. So the model will learn the seasonality before detecting streaming data. Otherwise, the model need to wait for weeks to learn the weekly pattern.

Case2. verify model with historical data

User can’t foresee detector will work well with future data. They should be able to verify the model by detecting anomalies in historical data. Currently, the only way user feed historical data into detectors is previewing anomaly result. The result is based on sampling data points, so we can't get anomaly result of all data points. It's not so solid to evaluate model's performance just on partial data. For example, you may find no anomalies in preview result as the sampling logic skipped the real anomalous data points. Preview function doesn't store anomaly result and rerun may generate different results as sampled data points may change totally with small time range difference.

User can create multiple detectors to try different configurations. By reviewing AD results of various detector configurations, user can learn how model performs differently. This helps them build confidence to use AD and learn how to use AD effectively.

Case3. evaluate model

Show user recall, precision and F1 score, so user can know exactly the model performance and tune it. That requires labeled data. Two ways to get labeled data:
1.User provide labeled data.
2.Provide a way for user to label data. We can allow user to label anomalies on chart directly

Note: Don’t overfit the model.

Case4. detect historical anomalies

User can detect anomalies in historical data to help analyze and enhance business strategy and process. For example, find anomalies in credit card transaction can help bank identify fraud. By analyzing anomalies of delivery time, online store can tune their process to make delivery time more accurate.

Proposed solution

Currently user can configure detector, preview anomalies on sampled historical data. Start a detector will start a job in backend to detect streaming data. The job is hidden behind detector. User can only start/stop/configure detector job by start/stop/configure detector. To support anomaly detection on historical data, we should show user more knobs to control how to run the job. We come up with “detection task” which represents a detector run. Then user can create detection task running on historical data.

From schedule perspective, a detection task can be one-time task or periodic task. From payload perspective, a detection task can be batch task(feed multiple historical data points and generate multiple anomaly results) and realtime/non-batch task(feed one streaming data point and generate one anomaly result). Current detector job will be a periodic realtime/non-batch task. User can define a batch task to detect anomalies in historical data and pre-train model. Batch task is an async job at backend, and we will show its progress on AD Kibana. Based on the tasks’ result, user can verify/evaluate model, compare different models, analyze trending/distribution changes etc.

Currently we can't feed historical data into AD model. AD model needs to take in streaming data, train and predict at the same time. For example, user have historical data and know it has some weekly pattern, such as CPU will have a spike at 10:00PM every Saturday when they run some batch task. Model should not consider these spikes as anomalies. But currently users have no way to pre-train model to learn such weekly pattern. To solve this, one option is user can create detection task on historical data. If one task has good result, user can save it as reusable checkpoint. We save checkpoint for every task by default, but user need to explicitly define which one is reusable to make it easier to search/select on Kibana. If user choose one checkpoint to run task, will restore model from checkpoint and serve anomaly detection requests.

For preview function, current implementation detect anomalies on sampled data. So the preview result is not so accurate and may miss some anomalies if the anomalous data point not sampled. We can provide user option to use full historical data to preview the result which may take a longer time but generate more accurate result. If use full historical data, we will create a one-time batch task at the backend, and refresh preview Kibana page periodically to show detection progress and results.

ylwu-amzn · 2020-07-10T03:08:08Z

Welcome any comments/suggestions. For example,

What's your use case which not covered in the RFC?
Do you have any suggestions for the solution or any new options?
What feature/enhancement do you want most?

sean-zheng-amazon added AnomalyDetection Item related to Anomaly Detection and AD Kibana plugin feature new feature labels Jun 10, 2020

sean-zheng-amazon added this to the Finalize UX requirement milestone Jun 24, 2020

anirudha changed the title ~~Add Historical Data workbench in AD~~ Add historical data workbench in anomaly detection Jul 8, 2020

sean-zheng-amazon modified the milestones: Finalize UX requirement, Backend implementation Sep 2, 2020

ohltyler linked a pull request Jan 22, 2021 that will close this issue

Add historical detectors #359

Merged

ohltyler closed this as completed in #359 Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add historical data workbench in anomaly detection #214

Add historical data workbench in anomaly detection #214

sean-zheng-amazon commented Jun 10, 2020

ylwu-amzn commented Jul 10, 2020 •

edited

Loading

ylwu-amzn commented Jul 10, 2020

Add historical data workbench in anomaly detection #214

Add historical data workbench in anomaly detection #214

Comments

sean-zheng-amazon commented Jun 10, 2020

ylwu-amzn commented Jul 10, 2020 • edited Loading

[RFC] Anomaly detection on historical data

Problem statement

Case1. train model with historical data

Case2. verify model with historical data

Case3. evaluate model

Case4. detect historical anomalies

Proposed solution

ylwu-amzn commented Jul 10, 2020

ylwu-amzn commented Jul 10, 2020 •

edited

Loading