Metric groups #36
Replies: 7 comments 44 replies
-
I think this could be super useful! In my head I'm thinking of logical groupings for things like:
To me having such an ability really drives to the point of making the metrics useful, so yeah, great idea! |
Beta Was this translation helpful? Give feedback.
-
I also mentioned this idea; for instance at Fiberplane we have several functions responsible for our real-time component. Grouping these together (and later on attaching an SLO to them) enables you to reason about their performance and availabilty as a unit of work. |
Beta Was this translation helpful? Give feedback.
-
A little detail on how we could implement this: I think we'd want to support arbitrary nesting and overlapping of groups. The way to do this with Prometheus labels is pretty fun -- and, amazingly but maybe not surprisingly, built on another idea I got from Brian Brazil's blog: Negative lookahead assertions in PromQL selectors. When you're attaching multiple group labels to a metric, you would join all of them into a single label value with some separator like a space. PromQL regexes intentionally don't support lookaheads so we need Brian Brazil's trick to be able to query for metrics that belong to multiple groups. We would use multiple regex label selectors in the query Another detail that I think is kind of neat about this is that this adds labels without adding cardinality. The metrics for a given function would always be produced with the group labels of all of the groups it's part of. So in Prometheus, there would only be a single time series with all of the different labels attached. The only time it would need to start a new time series is if you changed the group membership for a function, but then the old time series would be removed from memory after a little while. |
Beta Was this translation helpful? Give feedback.
-
All three options that @IvanMerrill pointed out stand out as great candidates. A fourth that I haven't made up my mind out would be a grouping for functionality that is depended on some flaky or external resource. Almost like a troubleshooting hint "If this is broke, always look here first". Some of those dependencies are so much clearer when your coding then when its in production. The great thing with groups being flexible is users may start to find interesting reasons to groups calls we haven't thought of. |
Beta Was this translation helpful? Give feedback.
-
What should the relationship between groups and SLO Objectives be? It would make sense if you could create some groups, attach those to functions (similar to how you do that with SLOs now), and then later attach an objective to a group. One question is whether we care about retroactively including groups in SLOs once you've added them.
|
Beta Was this translation helpful? Give feedback.
-
I find the idea of a Right now, you can use Prometheus relabeling rules to add such a label. However, we can only use such a label in the queries we generate if we standardize the name of the label. One argument that seems pretty strong for having a Now, if we were going to add a |
Beta Was this translation helpful? Give feedback.
-
This is a very fruitful discussion so far, though it seems like many of the potential use cases we've discussed would be best left out of scope for a groups feature specifically. I'm wondering if we should go back to the original use case @akesling suggested for groups:
If we think of groups as pre-SLOs and leave other use cases out of scope, this has some implications for how we might implement it. One big one I can think of is that we may not need to support functions being members of multiple groups. Each function can belong to one group, and then you can add a whole group to an SLO when you create them. This would be a bit simpler to implement, as we wouldn't need the tricks I described here. One reason for this limitation is that we don't currently support having one function being part of multiple SLOs. It would pose a problem if you could add a function to multiple groups, and then add the groups to different SLOs, which would mean we'd need to decide which SLO the function is part of. We could theoretically make it possible to add functions to multiple SLOs but this would definitely complicate things (particularly because the SLO is defined using not one but multiple labels for the name, percentile, and latency). What do folks think about that way of looking at it? I wonder if there's some more specific name than "groups" that would make the scope of such a feature clearer. |
Beta Was this translation helpful? Give feedback.
-
@akesling suggested that we might want to generalize how we're handling SLOs now to support metric groups.
His point was that you might want to group a set of function-level metrics together first so that you can monitor them. Then, you might later want to decide on an SLO for them and attach the SLO to the group (or vice versa).
The main way that we would use these types of groups right now would probably be to have a Grafana dashboard that shows you a row for each group, similar to what we have for SLOs but without the targets attached.
What do folks think? Does this sound useful? Should we do this now or hold off until later?
Beta Was this translation helpful? Give feedback.
All reactions