Move runtime metrics reporting to a separate timer routine. #294

cory-stripe · 2017-10-19T15:33:53Z

Summary

Move runtime metric reporting to a separate ticker and goroutine.

Motivation

At present our emission of runtime metrics — gc, heap etc — is tied directly to the flusher. This seems like a bad mix of concerns. It also means that our runtime metrics are tied directly to our flush "tick". In a future where this is adjustable or non-existent we need a plan!

Notes

I also added the current goroutine count, which seems like good information in case we leak goroutines.

Test plan

Existing tests. We don't test these now and this doesn't change that.

r? @stripe/observability

stripe-ci · 2017-10-19T15:35:03Z

Gerald Rule: Copy Observability on Veneur and Unilog pull requests

cc @stripe/observability
cc @stripe/observability-stripe

an-stripe · 2017-10-19T18:53:23Z

r? @stripe/observability

I don't think I have the state + focus to do this during /dev/start, sorry!

an-stripe · 2017-10-19T18:55:37Z

r? @stripe/observability

cory-stripe · 2017-10-20T22:12:42Z

Hey @aditya-stripe, this one is ready. Note that the intervals for this and the flush are ~synchronized and it should do what you suggested in a separate PR with regards to being at a high water mark. I would argue that it's important it be separated so as not to just give us the skewed, flush-time numbers. Having a separate ticker without the "bucketing" behavior should give us something that has a bit of periodicity to it which I think is an improvement.

ChimeraCoder · 2017-10-23T20:40:31Z

At the moment, we're collecting these metrics immediately after each flush is initiated. This may or may not actually be the exact peak (or nadir) for metrics like memory usage, because of timing jitter as goroutines are scheduled. Though it is likely to be close, and at the very least, highly-correlated with the higher end of the distribution.

In reality, what we want is a measurement of the full range of the distribution, because these will change over the span of the 10-second flush cycle. We'll want to choose something that's less than 5 seconds, like 3s or 4s.

And if we're emitting these metrics multiple times per flush cycle, we'll also want to change them to be histograms, not gauges.

So, lgtm for now, but let's file a ticket to update this as well.

stripe-ci assigned an-stripe Oct 19, 2017

cory-stripe force-pushed the cory-runtime-reporting-thread branch from b374b33 to 80b93e0 Compare October 19, 2017 15:34

an-stripe removed their assignment Oct 19, 2017

cory-stripe mentioned this pull request Oct 19, 2017

Combine global/local logic into one unified flush and parallelize sinks. #292

Merged

Move runtime metrics reporting to a seprate timer routine.

3316668

cory-stripe force-pushed the cory-runtime-reporting-thread branch from 80b93e0 to 3316668 Compare October 20, 2017 21:11

cory-stripe assigned aditya-stripe Oct 20, 2017

cory-stripe changed the title ~~Move runtime metrics reporting to a seprate timer routine.~~ Move runtime metrics reporting to a separate timer routine. Oct 23, 2017

aditya-stripe assigned cory-stripe and unassigned aditya-stripe Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move runtime metrics reporting to a separate timer routine. #294

Move runtime metrics reporting to a separate timer routine. #294

cory-stripe commented Oct 19, 2017 •

edited

Loading

stripe-ci commented Oct 19, 2017

an-stripe commented Oct 19, 2017

an-stripe commented Oct 19, 2017

cory-stripe commented Oct 20, 2017 •

edited

Loading

ChimeraCoder commented Oct 23, 2017

Move runtime metrics reporting to a separate timer routine. #294

Are you sure you want to change the base?

Move runtime metrics reporting to a separate timer routine. #294

Conversation

cory-stripe commented Oct 19, 2017 • edited Loading

Summary

Motivation

Notes

Test plan

stripe-ci commented Oct 19, 2017

Gerald Rule: Copy Observability on Veneur and Unilog pull requests

an-stripe commented Oct 19, 2017

an-stripe commented Oct 19, 2017

cory-stripe commented Oct 20, 2017 • edited Loading

ChimeraCoder commented Oct 23, 2017

cory-stripe commented Oct 19, 2017 •

edited

Loading

cory-stripe commented Oct 20, 2017 •

edited

Loading