Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement telemetry API v2 #62

Merged
merged 40 commits into from
Oct 17, 2023
Merged

Implement telemetry API v2 #62

merged 40 commits into from
Oct 17, 2023

Conversation

cgilmour
Copy link
Contributor

This implements an internal API that measures activity within the tracer and reports it to the Datadog Agent.
Users are able to disable this via the environment variable DD_INSTRUMENTATION_TELEMETRY_ENABLED or the report_telemetry config option in TracerConfig.

Additional metrics may be implemented in the future.

cgilmour and others added 27 commits October 3, 2023 03:48
- mention new files in bazel build
- don't store FinalizedTracerConfig
- consistent spacing_style forMemberFunctions
- make config_json() available to send_app_started()
- fix unrelated pet peeve in use of log_startup
- remove dev noise, which fixed all but one of the broken unit tests
@cgilmour cgilmour requested a review from dgoffredo October 12, 2023 19:51
Copy link
Contributor

@dgoffredo dgoffredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

Here's my first pass through the changes. See my comments inline. Overall:

  • good documentation
  • design and style are consistent with the rest of the library
  • test coverage is almost perfect on a line-by-line basis. You missed a spot.

I'll do another pass where I look at the unit tests in more detail, and the serialization code. And maybe poke around with an Agent proxy.

This is pretty much ready to merge, once we come to an agreement on ideas raised in the comments.

src/datadog/datadog_agent.cpp Outdated Show resolved Hide resolved
src/datadog/datadog_agent.cpp Outdated Show resolved Hide resolved
src/datadog/metrics.h Outdated Show resolved Hide resolved
src/datadog/tracer_telemetry.h Outdated Show resolved Hide resolved
src/datadog/tracer_telemetry.cpp Outdated Show resolved Hide resolved
test/mocks/http_clients.h Outdated Show resolved Hide resolved
test/test_datadog_agent.cpp Show resolved Hide resolved
@pr-commenter
Copy link

pr-commenter bot commented Oct 12, 2023

Benchmarks

Benchmark execution time: 2023-10-17 00:19:33

Comparing candidate commit c57a039 in PR branch cgilmour/telemetry-api with baseline commit 45c3c05 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

@cgilmour
Copy link
Contributor Author

I've made the requested changes / fixes, and also merged in #61 because it was flaking for this PR as well.
Have another look when you have a chance.

Copy link
Contributor

@dgoffredo dgoffredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Thinking a bit more about the (remote?) possibility that this code gets deployed to a system with a Datadog Agent that's old enough not to have telemetry, do consider silencing or perhaps one-timing the error in telemetry_on_response_ when the response status is 404 ("error: Datadog Agent does not look like it supports telemetry", then subsequent silence), or something like that. Also reasonable would be to leave it as is.

Please see my comments about the type signatures of Metric's constructor parameters.

Aside from those ideas, ship it.

src/datadog/datadog_agent.cpp Outdated Show resolved Hide resolved
src/datadog/datadog_agent.cpp Outdated Show resolved Hide resolved
src/datadog/datadog_agent.cpp Outdated Show resolved Hide resolved
Metric::Metric(const std::string name, std::string type,
const std::vector<std::string> tags, bool common)
Metric::Metric(std::string name, std::string type,
std::vector<std::string> tags, bool common)
: name_(name), type_(type), tags_(tags), common_(common) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're accepting these parameters by value, might as well std::move them into these initializers.

Alternatively, use const T& instead.

Alternatively, use const vector& and StringView.

Alternatively, use std::move(vector) and StringView.

It's true that this code runs on tracer startup only, so it doesn't matter. But, style.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also just leave it like this. Less ink.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm looking at the individual commits again, I wonder if the original const that you had on these parameters was a typo where you were missing the &. Anyway, figure out what you want the signatures and initializers to look like, based on how the constructor is called.

http_client_(config.http_client),
event_scheduler_(config.event_scheduler),
cancel_scheduled_flush_(event_scheduler_->schedule_recurring_event(
config.flush_interval, [this]() { flush(); })),
flush_interval_(config.flush_interval) {
assert(logger_);
assert(tracer_telemetry_);
if (tracer_telemetry_->enabled()) {
// Only schedule this if telemetry is enabled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details but the line above is self explanatory.

Comment on lines +38 to +42
std::string name();
std::string type();
std::vector<std::string> tags();
bool common();
uint64_t value();
Copy link
Collaborator

@dmehala dmehala Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably be controversial here but we could get rid of those accessors and put those members public 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value() hides the atomicity of the underlying member, for better or for worse.

My vote is to keep the trivial methods, but I thought the same as you when I initially read it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion about this, but I like the consistency and simplicity.
Have a go with the suggestion and see if you like it at the end.

Comment on lines +126 to +146
if (!points.empty()) {
auto type = metric.type();
if (type == "count") {
metrics.emplace_back(nlohmann::json::object({
{"metric", metric.name()},
{"tags", metric.tags()},
{"type", metric.type()},
{"points", points},
{"common", metric.common()},
}));
} else if (type == "gauge") {
// gauge metrics have a interval
metrics.emplace_back(nlohmann::json::object({
{"metric", metric.name()},
{"tags", metric.tags()},
{"type", metric.type()},
{"interval", 10},
{"points", points},
{"common", metric.common()},
}));
}
Copy link
Collaborator

@dmehala dmehala Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be done while capturing metrics so we avoid iterating twice every 60s and it will make heartbeat_and_telemetry easier to read and maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, though I do like the simplicity of capture_metrics right now.
In the future if rate metrics, or distributions are added, it'll probably need refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't even use gauge currently. I assume Caleb left it in anticipating it would be used soon. Technically dead code, though, what do you think?

Copy link
Collaborator

@dmehala dmehala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@cgilmour cgilmour merged commit fd23d07 into main Oct 17, 2023
@cgilmour cgilmour deleted the cgilmour/telemetry-api branch October 17, 2023 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants