Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include sum value for summary type as per OpenMetrics / Prometheus format #621

Merged
merged 5 commits into from
Nov 2, 2020

Conversation

Channyboy
Copy link
Contributor

@Channyboy Channyboy commented Oct 27, 2020

Copy link
Contributor

@donbourne donbourne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timer also uses summary type in output and needs a _sum as shown at https://prometheus.io/docs/instrumenting/exposition_formats/

# Finally a summary, which has a complex representation, too:
# HELP rpc_duration_seconds A summary of the RPC duration in seconds.
# TYPE rpc_duration_seconds summary
rpc_duration_seconds{quantile="0.01"} 3102
rpc_duration_seconds{quantile="0.05"} 3272
rpc_duration_seconds{quantile="0.5"} 4773
rpc_duration_seconds{quantile="0.9"} 9001
rpc_duration_seconds{quantile="0.99"} 76656
rpc_duration_seconds_sum 1.7560473e+07
rpc_duration_seconds_count 2693

spec/src/main/asciidoc/changelog.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/changelog.adoc Outdated Show resolved Hide resolved
@donbourne
Copy link
Contributor

To sum up discussion yesterday with @Channyboy , there's no way to be 100% consistent in our naming of methods and names used in export format to describe this new sum across histograms / timers / simpleTimers without somewhat needlessly breaking backwards compatibility. So here's what we're suggesting is about as good as we can do for consistency:

                histogram           timer                       simpleTimer
        api
                extends Summable    extends Summable            extends Summable
                long getSum()       long getSum()               long getSum()    
                                    Duration getElapsedTime()   Duration getElapsedTime()

        json field
                sum                 sum                         elapsedTime

        om suffix
                <units>_sum         seconds_sum                 elapsedTime_seconds

One part that will feel inconsistent, IMO, is that, in JSON format, timers have sum and simpleTimers have elapsedTime for the same concept. The alternative would have been to make histograms and timers have different json field names which seemed odd since histograms and timers both have the same set of metrics coming from what OM format calls a summary.

The other part that will feel inconsistent is that, in OM format, timers have a metric with a suffix of seconds_sum (as dictated by OM's summary expo format), whereas simpleTimers have a metric with a suffix of elapsedTime_seconds (which isn't part of a summary) for the same concept. Again, we could have changed simpleTimer's suffix, but that seems like a needless break of backward compatibility.

spec/src/main/asciidoc/changelog.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/changelog.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/changelog.adoc Outdated Show resolved Hide resolved
@Channyboy Channyboy requested a review from donbourne October 29, 2020 01:34
Copy link
Contributor

@jmartisk jmartisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're wrong in applying the same interface (Summing) both to histograms and to timers, because a "sum" has a bit different meaning in the concept arbitrary numbers (in case of histograms) and in time durations. In particular I don't like having a getSum() method in timers which returns a number rather than a Duration, even worse if we keep Duration getElapsedTime() there as well, which does the same thing but returns a different type. That's confusing.

We're doing a breaking change release and I personally think it's better to make use of it and break compatibility, if it helps the API from getting confusing. Otherwise we'll have to keep using that confusing API for a long time.

So my suggestion would be to not use a Summable interface at all, and instead we would have

                histogram           timer                       simpleTimer
        api
                long getSum()       Duration getSum()           Duration getSum()    

        json field
                sum                 sum                         sum

        om suffix
                <units>_sum         seconds_sum                 seconds_sum

It feels much more concise, and as I said I wouldn't care that much about breakage..

If we indeed do want to have a Summable interface, then I'd say we should have two, one for histograms (with long getSum()) and one for (simple)timers with Duration getSum().

@donbourne
Copy link
Contributor

We're already breaking the API's backward compatibility in a few places, and API breakage is easy to spot when you compile, so I'm less concerned about one more small break there.

We haven't broken grafana dashboard backward compatibility yet in MP Metrics 3.0, and it feels like doing it just so SimpleTimer and Timer can use the same suffix on sums might be annoying to users.

Would it make sense to break backwards compatibility in the API but keep the export formats unchanged, as in the following?

                histogram           timer                       simpleTimer
        api
                long getSum()       Duration getSum()           Duration getSum()

        json field
                sum                 sum                         elapsedTime

        om suffix
                <units>_sum         seconds_sum                 elapsedTime_seconds

@donbourne
Copy link
Contributor

donbourne commented Oct 30, 2020

Looking at the code changes it occurs to me that there isn't really much motivation (now that we've eliminated Summable) to change SimpleTimer and Timer from getElapsedTime to getSum. IMO, avoiding that change doesn't affect understandability of those APIs and it would avoid the breaking change to the SimpleTimer API. So I'll refine my previous suggestion to...

                histogram           timer                       simpleTimer
        api
                long getSum()       Duration elapsedTime()      Duration elapsedTime()

        json field
                sum                 elapsedTime                 elapsedTime

        om suffix
                <units>_sum         seconds_sum                 elapsedTime_seconds

spec/src/main/asciidoc/changelog.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/changelog.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/rest-endpoints.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/rest-endpoints.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/rest-endpoints.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/rest-endpoints.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/rest-endpoints.adoc Outdated Show resolved Hide resolved
spec/src/main/asciidoc/rest-endpoints.adoc Outdated Show resolved Hide resolved
@jmartisk
Copy link
Contributor

Looking at the code changes it occurs to me that there isn't really much motivation (now that we've eliminated Summable) to change SimpleTimer and Timer from getElapsedTime to getSum. IMO, avoiding that change doesn't affect understandability of those APIs and it would avoid the breaking change to the SimpleTimer API. So I'll refine my previous suggestion to...

                histogram           timer                       simpleTimer
        api
                long getSum()       Duration elapsedTime()      Duration elapsedTime()

        json field
                sum                 elapsedTime                 elapsedTime

        om suffix
                <units>_sum         seconds_sum                 elapsedTime_seconds

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants