-
Notifications
You must be signed in to change notification settings - Fork 18
Add historical data workbench in anomaly detection #214
Comments
[RFC] Anomaly detection on historical dataThis RFC is to discuss how to enhance AD plugin to support detecting anomaly on historical data. So user can evaluate model with historical data and tune it. User can also detect anomaly in historical data to find opportunity to enhance business strategy/process, reduce cost etc. Problem statementWe plan to support these use cases. Case1. train model with historical dataUser’s business has seasonal trends and they want to feed historical data into model. So the model will learn the seasonality before detecting streaming data. Otherwise, the model need to wait for weeks to learn the weekly pattern. Case2. verify model with historical dataUser can’t foresee detector will work well with future data. They should be able to verify the model by detecting anomalies in historical data. Currently, the only way user feed historical data into detectors is previewing anomaly result. The result is based on sampling data points, so we can't get anomaly result of all data points. It's not so solid to evaluate model's performance just on partial data. For example, you may find no anomalies in preview result as the sampling logic skipped the real anomalous data points. Preview function doesn't store anomaly result and rerun may generate different results as sampled data points may change totally with small time range difference. User can create multiple detectors to try different configurations. By reviewing AD results of various detector configurations, user can learn how model performs differently. This helps them build confidence to use AD and learn how to use AD effectively. Case3. evaluate modelShow user recall, precision and F1 score, so user can know exactly the model performance and tune it. That requires labeled data. Two ways to get labeled data: Note: Don’t overfit the model. Case4. detect historical anomaliesUser can detect anomalies in historical data to help analyze and enhance business strategy and process. For example, find anomalies in credit card transaction can help bank identify fraud. By analyzing anomalies of delivery time, online store can tune their process to make delivery time more accurate. Proposed solutionCurrently user can configure detector, preview anomalies on sampled historical data. Start a detector will start a job in backend to detect streaming data. The job is hidden behind detector. User can only start/stop/configure detector job by start/stop/configure detector. To support anomaly detection on historical data, we should show user more knobs to control how to run the job. We come up with “detection task” which represents a detector run. Then user can create detection task running on historical data. From schedule perspective, a detection task can be one-time task or periodic task. From payload perspective, a detection task can be batch task(feed multiple historical data points and generate multiple anomaly results) and realtime/non-batch task(feed one streaming data point and generate one anomaly result). Current detector job will be a periodic realtime/non-batch task. User can define a batch task to detect anomalies in historical data and pre-train model. Batch task is an async job at backend, and we will show its progress on AD Kibana. Based on the tasks’ result, user can verify/evaluate model, compare different models, analyze trending/distribution changes etc. Currently we can't feed historical data into AD model. AD model needs to take in streaming data, train and predict at the same time. For example, user have historical data and know it has some weekly pattern, such as CPU will have a spike at 10:00PM every Saturday when they run some batch task. Model should not consider these spikes as anomalies. But currently users have no way to pre-train model to learn such weekly pattern. To solve this, one option is user can create detection task on historical data. If one task has good result, user can save it as reusable checkpoint. We save checkpoint for every task by default, but user need to explicitly define which one is reusable to make it easier to search/select on Kibana. If user choose one checkpoint to run task, will restore model from checkpoint and serve anomaly detection requests. For preview function, current implementation detect anomalies on sampled data. So the preview result is not so accurate and may miss some anomalies if the anomalous data point not sampled. We can provide user option to use full historical data to preview the result which may take a longer time but generate more accurate result. If use full historical data, we will create a one-time batch task at the backend, and refresh preview Kibana page periodically to show detection progress and results. |
Welcome any comments/suggestions. For example,
|
Currently the only way users can look into historical data before AD is created is to use the preview function during the middle of AD creation. However the preview function is quite limited and display only sampled data. We want to create a dedicated workbench to let users play around with historical data, and apply models to see results.
The text was updated successfully, but these errors were encountered: