Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Enhance result index memory estimation #1381

Open
amitgalitz opened this issue Dec 4, 2024 · 0 comments
Open

[FEATURE] Enhance result index memory estimation #1381

amitgalitz opened this issue Dec 4, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@amitgalitz
Copy link
Member

amitgalitz commented Dec 4, 2024

Is your feature request related to a problem?
We need to provide users with more accurate and accessible metrics for estimating the size and document count of their Anomaly Detection result index. This will help users better plan their storage requirements and manage their cluster resources more effectively.

Current solution

Users can manually estimate storage requirements based on the formula provided in the documentation or after running there detector for a while on a test cluster they can better understand how much storage there results are taking.
For example:
Default result index: The size depends on the number of result documents (both anomalous and non-anomalous), their size (approximately 1 KB each), the retention period (default 30 days), and the number of shard replicas. Example:
A detector with a 10-minute interval and 1 million entities can generate roughly 144 GB/day, resulting in approximately 4,320 GB over 30 days.
Adjusting the primary shard and replica settings changes the total disk requirements.
Custom result index: Users have more control over index settings, such as the number of shards, replica and they can even configure their own ISM policy

What solution would you like?

Option 1: Add an estimation API on the backend or some frontend only calculator feature to give user an easier time with estimating how much storage they will be utilizing.

Option 2: Add result index memory estimates to our current validation API and give user a warning if we aren't going to have enough disk space for the results, this might not automatically always provide the estimate back to the user unless we make slight changes to validation API response.

Additional notes:

  • For HC detectors a large part of understanding how much storage we will need for the result is based on the number of entities. We can query the historical data to gain a better understanding of this but this might not be too accurate if their isn't enough historical data or the more the number of active entities are changing.

What alternatives have you considered?
If we believe this is a simple enough task, what we might be missing is just more documentation and examples for how users can easily estimate the disk space they will need

@amitgalitz amitgalitz added enhancement New feature or request untriaged and removed untriaged labels Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant