-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move runtime metrics reporting to a separate timer routine. #294
base: master
Are you sure you want to change the base?
Conversation
b374b33
to
80b93e0
Compare
Gerald Rule: Copy Observability on Veneur and Unilog pull requestscc @stripe/observability |
r? @stripe/observability I don't think I have the state + focus to do this during /dev/start, sorry! |
r? @stripe/observability |
80b93e0
to
3316668
Compare
Hey @aditya-stripe, this one is ready. Note that the intervals for this and the flush are ~synchronized and it should do what you suggested in a separate PR with regards to being at a high water mark. I would argue that it's important it be separated so as not to just give us the skewed, flush-time numbers. Having a separate ticker without the "bucketing" behavior should give us something that has a bit of periodicity to it which I think is an improvement. |
At the moment, we're collecting these metrics immediately after each flush is initiated. This may or may not actually be the exact peak (or nadir) for metrics like memory usage, because of timing jitter as goroutines are scheduled. Though it is likely to be close, and at the very least, highly-correlated with the higher end of the distribution. In reality, what we want is a measurement of the full range of the distribution, because these will change over the span of the 10-second flush cycle. We'll want to choose something that's less than 5 seconds, like 3s or 4s. And if we're emitting these metrics multiple times per flush cycle, we'll also want to change them to be histograms, not gauges. So, lgtm for now, but let's file a ticket to update this as well. |
Summary
Move runtime metric reporting to a separate ticker and goroutine.
Motivation
At present our emission of runtime metrics — gc, heap etc — is tied directly to the flusher. This seems like a bad mix of concerns. It also means that our runtime metrics are tied directly to our flush "tick". In a future where this is adjustable or non-existent we need a plan!
Notes
I also added the current goroutine count, which seems like good information in case we leak goroutines.
Test plan
Existing tests. We don't test these now and this doesn't change that.
r? @stripe/observability