[FEATURE] Enhance result index memory estimation #1381

amitgalitz · 2024-12-04T15:47:20Z

Is your feature request related to a problem?
We need to provide users with more accurate and accessible metrics for estimating the size and document count of their Anomaly Detection result index. This will help users better plan their storage requirements and manage their cluster resources more effectively.

Current solution

Users can manually estimate storage requirements based on the formula provided in the documentation or after running there detector for a while on a test cluster they can better understand how much storage there results are taking.
For example:
Default result index: The size depends on the number of result documents (both anomalous and non-anomalous), their size (approximately 1 KB each), the retention period (default 30 days), and the number of shard replicas. Example:
A detector with a 10-minute interval and 1 million entities can generate roughly 144 GB/day, resulting in approximately 4,320 GB over 30 days.
Adjusting the primary shard and replica settings changes the total disk requirements.
Custom result index: Users have more control over index settings, such as the number of shards, replica and they can even configure their own ISM policy

What solution would you like?

Option 1: Add an estimation API on the backend or some frontend only calculator feature to give user an easier time with estimating how much storage they will be utilizing.

Option 2: Add result index memory estimates to our current validation API and give user a warning if we aren't going to have enough disk space for the results, this might not automatically always provide the estimate back to the user unless we make slight changes to validation API response.

Additional notes:

For HC detectors a large part of understanding how much storage we will need for the result is based on the number of entities. We can query the historical data to gain a better understanding of this but this might not be too accurate if their isn't enough historical data or the more the number of active entities are changing.

What alternatives have you considered?
If we believe this is a simple enough task, what we might be missing is just more documentation and examples for how users can easily estimate the disk space they will need

amitgalitz added enhancement New feature or request untriaged and removed untriaged labels Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Enhance result index memory estimation #1381

[FEATURE] Enhance result index memory estimation #1381

amitgalitz commented Dec 4, 2024 •

edited

Loading

[FEATURE] Enhance result index memory estimation #1381

[FEATURE] Enhance result index memory estimation #1381

Comments

amitgalitz commented Dec 4, 2024 • edited Loading

amitgalitz commented Dec 4, 2024 •

edited

Loading