-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement telemetry API v2 #62
Conversation
- mention new files in bazel build - don't store FinalizedTracerConfig - consistent spacing_style forMemberFunctions - make config_json() available to send_app_started() - fix unrelated pet peeve in use of log_startup - remove dev noise, which fixed all but one of the broken unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Here's my first pass through the changes. See my comments inline. Overall:
- good documentation
- design and style are consistent with the rest of the library
- test coverage is almost perfect on a line-by-line basis. You missed a spot.
I'll do another pass where I look at the unit tests in more detail, and the serialization code. And maybe poke around with an Agent proxy.
This is pretty much ready to merge, once we come to an agreement on ideas raised in the comments.
…ry is making Curl requests.
I've made the requested changes / fixes, and also merged in #61 because it was flaking for this PR as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Thinking a bit more about the (remote?) possibility that this code gets deployed to a system with a Datadog Agent that's old enough not to have telemetry, do consider silencing or perhaps one-timing the error in telemetry_on_response_
when the response status is 404 ("error: Datadog Agent does not look like it supports telemetry", then subsequent silence), or something like that. Also reasonable would be to leave it as is.
Please see my comments about the type signatures of Metric
's constructor parameters.
Aside from those ideas, ship it.
Metric::Metric(const std::string name, std::string type, | ||
const std::vector<std::string> tags, bool common) | ||
Metric::Metric(std::string name, std::string type, | ||
std::vector<std::string> tags, bool common) | ||
: name_(name), type_(type), tags_(tags), common_(common) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're accepting these parameters by value, might as well std::move
them into these initializers.
Alternatively, use const T&
instead.
Alternatively, use const vector&
and StringView
.
Alternatively, use std::move(vector)
and StringView
.
It's true that this code runs on tracer startup only, so it doesn't matter. But, style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also just leave it like this. Less ink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm looking at the individual commits again, I wonder if the original const
that you had on these parameters was a typo where you were missing the &
. Anyway, figure out what you want the signatures and initializers to look like, based on how the constructor is called.
http_client_(config.http_client), | ||
event_scheduler_(config.event_scheduler), | ||
cancel_scheduled_flush_(event_scheduler_->schedule_recurring_event( | ||
config.flush_interval, [this]() { flush(); })), | ||
flush_interval_(config.flush_interval) { | ||
assert(logger_); | ||
assert(tracer_telemetry_); | ||
if (tracer_telemetry_->enabled()) { | ||
// Only schedule this if telemetry is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details but the line above is self explanatory.
std::string name(); | ||
std::string type(); | ||
std::vector<std::string> tags(); | ||
bool common(); | ||
uint64_t value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably be controversial here but we could get rid of those accessors and put those members public 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value()
hides the atomicity of the underlying member, for better or for worse.
My vote is to keep the trivial methods, but I thought the same as you when I initially read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion about this, but I like the consistency and simplicity.
Have a go with the suggestion and see if you like it at the end.
if (!points.empty()) { | ||
auto type = metric.type(); | ||
if (type == "count") { | ||
metrics.emplace_back(nlohmann::json::object({ | ||
{"metric", metric.name()}, | ||
{"tags", metric.tags()}, | ||
{"type", metric.type()}, | ||
{"points", points}, | ||
{"common", metric.common()}, | ||
})); | ||
} else if (type == "gauge") { | ||
// gauge metrics have a interval | ||
metrics.emplace_back(nlohmann::json::object({ | ||
{"metric", metric.name()}, | ||
{"tags", metric.tags()}, | ||
{"type", metric.type()}, | ||
{"interval", 10}, | ||
{"points", points}, | ||
{"common", metric.common()}, | ||
})); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be done while capturing metrics so we avoid iterating twice every 60s and it will make heartbeat_and_telemetry
easier to read and maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, though I do like the simplicity of capture_metrics
right now.
In the future if rate metrics, or distributions are added, it'll probably need refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't even use gauge currently. I assume Caleb left it in anticipating it would be used soon. Technically dead code, though, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This implements an internal API that measures activity within the tracer and reports it to the Datadog Agent.
Users are able to disable this via the environment variable
DD_INSTRUMENTATION_TELEMETRY_ENABLED
or thereport_telemetry
config option inTracerConfig
.Additional metrics may be implemented in the future.