-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Guide: Routing traces from the Datadog Agent to Vector within Kubernetes
-
Install minikube: https://minikube.sigs.k8s.io/docs/start/
-
Install kubectl: https://kubernetes.io/docs/tasks/tools/
-
Install helm: https://helm.sh/docs/intro/install/
-
Install docker: https://docs.docker.com/engine/install/
-
Add the datadog-agent and vector helm charts to your local repo with:
helm repo add datadog https://helm.datadoghq.com && helm repo add vector https://helm.vector.dev
-
sync the local repo:
helm repo update
-
verify:
helm repo list
-
!Important! eval the docker-env so the ports are aligned.
eval $(minikube -p minikube docker-env)
-
Start the minikube cluster:
minikube start
-
Verify running
minikube status
- Create helm values file for the Datadog Agent
Note: For this example we will setup the agent to send traces to Vector, and configure the k8s namespace labels to be added as tags to the events.
Save the following in a file "agent.values.yaml", filling out <your_datadog_api_key> :
datadog:
apiKey: <your_datadog_api_key>
containerExclude: "name:vector"
logs:
enabled: true
containerCollectAll: true
apm:
enabled: true
## datadog.apm.portEnabled -- Enable APM over TCP communication (port 8126 by default)
## ref: https://docs.datadoghq.com/agent/kubernetes/apm/
portEnabled: true
clusterAgent:
enabled: false
agents:
useConfigMap: true
customAgentConfig:
kubelet_tls_verify: false
vector:
apm_config:
apm_dd_url: "http://vector.default:8282"
max_traces_per_second: 0
errors_per_second: 0
dogstatsd_non_local_traffic: true
kubernetes_namespace_labels_as_tags:
kubernetes.io/metadata.name: "kube_namespace"
-
Install the Datadog Agent container into the cluster
helm install datadog-agent datadog/datadog -f agent.values.yaml
-
Verify it is running
kubectl get pods
It should look something like this:
NAME READY STATUS RESTARTS AGE
datadog-agent-cluster-agent-767d89c9c5-tbngs 1/1 Running 0 4m31s
datadog-agent-kube-state-metrics-658d989649-j7jt8 1/1 Running 0 4m31s
datadog-agent-rjqxg 3/3 Running 0 4m31s
-
Check logs for errors
kubectl logs datadog-agent-rjqxg
- Create helm values file for Vector
Save the following in a file "vector.values.yaml", filling out <your_datadog_api_key>:
## See Vector helm documentation to learn more:
## https://vector.dev/docs/setup/installation/package-managers/helm/
# nameOverride -- Override name of app
fullnameOverride: vector
image:
tag: 0.23.3-debian
# resources -- Set Vector resource requests and limits.
resources:
## Required for HPA to function
requests:
cpu: 1000m
memory: 512Mi
# limits:
# cpu: 200m
# memory: 256Mi
# customConfig -- Override Vector's default configs, if used **all** options need to be specified
## This section supports using helm templates to populate dynamic values
## Ref: https://vector.dev/docs/reference/configuration/
customConfig:
data_dir: /vector-data-dir
api:
enabled: true
address: 0.0.0.0:8686
playground: false
sources:
datadog_agent:
address: 0.0.0.0:8282
type: datadog_agent
multiple_outputs: true
trace_proto: v1v2
internal_metrics:
type: internal_metrics
sinks:
datadog_logs:
type: datadog_logs
inputs:
- datadog_agent.logs
default_api_key: <your_datadog_api_key>
compression: gzip
datadog_metrics:
type: datadog_metrics
inputs:
- datadog_agent.metrics
- internal_metrics
default_api_key: <your_datadog_api_key>
datadog_traces:
type: datadog_traces
inputs:
- datadog_agent.traces
default_api_key: <your_datadog_api_key>
dbg:
type: console
encoding:
codec: json
inputs:
- datadog_agent.traces
# livenessProbe -- Override default liveness probe settings, if customConfig is used requires customConfig.api.enabled true
## Requires Vector's API to be enabled
livenessProbe:
httpGet:
path: /health
port: api
# readinessProbe -- Override default readiness probe settings, if customConfig is used requires customConfig.api.enabled true
## Requires Vector's API to be enabled
readinessProbe:
httpGet:
path: /health
port: api
-
Install the private vector image into the cluster:
helm install vector vector/vector -f ./vector.values.yaml
-
Verify vector is running:
kubectl get pods
It should look something like this:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
vector-0 1/1 Running 0 4m26s
-
Check logs for errors
kubectl logs vector-0
In the "vector.values.yaml" file above, we specified the image tag "0.23.3-debian". This corresponds to a vector release containing a published image.
If you are developing changes locally that you would like to test, you can create a vector container with your local changes with the following steps:
-
Build vector with
make package-<...>
for your appropriate architecture. This creates ./targets/artificats .make package-deb-x86_64-unknown-linux-gnu
-
!Important! Configure the docker image to be visible to minikube and the ports are aligned.
eval $(minikube -p minikube docker-env)
-
Create the container image, replace <your_tag> with some meaningful name, perhaps the same name as your git branch.
docker build --tag "timber.io/vector:<your_tag>" target/artifacts -f ./distribution/docker/debian/Dockerfile
-
Import the image into minikube:
minikube image load timber.io/vector:<your_tag>
-
Verify the image presence:
minikube image ls
Should contain a line with your image: "timber.io/vector:<your_tag>"
-
Edit the helm values to use the local image:
image:
repository: timber.io/vector
tag: <your_tag>
-
Install the private vector image into the cluster:
helm install vector vector/vector -f ./vector.values.yaml
Note: If you already have installed vector into the cluster via the prior steps, add the
--replace
option.
At this point the Datadog Agent and Vector should both be running in the minikube cluster.
For this example we will create a python process that generates a trace event, and runs inside the minikube cluster.
- Create the python script. Add the following to "trace.py":
#!/usr/bin/python3
####### dummy_trace_gen.py / need pip install ddtrace
import os
from ddtrace import tracer
tracer.configure(
hostname="datadog-agent",
port="8126",
)
top_level_tags = {
"foo": "bar",
"env": "my-dev-env",
}
tracer.set_tags(top_level_tags)
# Top level span
span = tracer.trace("operations.of.interest", service="trace-test-app")
span.set_tag("env", "my-dev-env")
span.set_tag("numeric", 1.234)
span.context.sampling_priority = 10
def nest_span(tracer, n):
if n == 0:
return
with tracer.trace("child_%d"%n, service='trace-test-app'):
time.sleep(0.1)
nest_span(tracer, n - 1)
for i in range(2):
import time
span
nest_span(tracer, 10)
span.finish()
print("traces sent")
- Create the Dockerfile. Add the following to "Dockerfile":
from python:3
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir ddtrace
COPY trace.py ./trace.py
RUN chmod +x ./trace.py
ENTRYPOINT ["python3", "trace.py"]
-
!Important! eval the docker-env so the ports are aligned.
eval $(minikube -p minikube docker-env)
-
Build the image:
docker build -t tracer .
-
Add the image to minikube:
minikube image load tracer:latest
-
Verify the image presence:
minikube image ls
Should contain an entry: "docker.io/library/tracer:latest"
-
Create a helm values file for the tracer program. Add the following to "tracer.values.yaml":
apiVersion: batch/v1
kind: Job
metadata:
name: tracer
spec:
template:
metadata:
name: tracer-pod
spec:
containers:
- name: tracer
image: tracer:latest
imagePullPolicy: Never
restartPolicy: Never
-
In another terminal, tail the vector logs:
kubectl logs -f vector-0
This will display the trace event after the next step is completed.
-
Install the tracer program into the cluster:
kubectl create -f ./tracer.values.yaml
-
Verify the container ran:
kubectl get pods
Should output something like this, tracer should show "Completed":
NAME READY STATUS RESTARTS AGE
datadog-agent-cluster-agent-767d89c9c5-tbngs 1/1 Running 0 4m31s
datadog-agent-kube-state-metrics-658d989649-j7jt8 1/1 Running 0 4m31s
datadog-agent-rjqxg 3/3 Running 0 4m31s
tracer-pn4b4 0/1 Completed 0 4m18s <<<<<<<<<<
vector-0 1/1 Running 0 4m26s
If it does not, check the logs using the method outlined above, and confirm the step #3 was performed.
-
Check the vector logs in the terminal from step #8 to confirm the trace event was received.
Should show the healthcheck passed, and the trace event should look something like this:
{"agent_version":"7.38.2","app_version":"","container_id":"7415d81d7f4554f677ac989b85556b73e162c02fc9f6c8f7f339767acd8c2351","dropped":false,"env":"none","error_tps":0.0,"host":"minikube","language_name":"python","language_version":"3.10.6","origin":"","payload_version":"v2","priority":10,"runtime_id":"","source_type":"datadog_agent","spans ":[{"duration":2003930490,"error":0,"meta":{"_dd.p.dm":"-0","env":"my-dev-env","foo":"bar","runtime-id":"1d0c8d97b50e45cca78d6a99ac8cfb18"},"meta_struct":{},"metrics":{"_dd.agent_psr":1.0,"_dd.top_level":1.0,"_dd.tracer_kr":1.0,"_sampling_priority_v1":10.0,"_top_level":1.0,"numeric":1.234,"system.pid":1.0},"name":"operations.of.interest","par ent_id":0,"resource":"operations.of.interest","service":"trace-test-app","span_id":8013505727887431227,"start":"2022-08-24T17:08:28.722038709Z","trace_id":-2120733611673413870,"type":""},{"duration":1002066383,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_10","parent_id":8013505727887431227,"res ource":"child_10","service":"trace-test-app","span_id":-6014030435588199007,"start":"2022-08-24T17:08:28.722095920Z","trace_id":-2120733611673413870,"type":""},{"duration":901902559,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_9","parent_id":-6014030435588199007,"resource":"child_9","service":" trace-test-app","span_id":4182555350259562811,"start":"2022-08-24T17:08:28.822257232Z","trace_id":-2120733611673413870,"type":""},{"duration":801700905,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_8","parent_id":4182555350259562811,"resource":"child_8","service":"trace-test-app","span_id":-1529 159581253984277,"start":"2022-08-24T17:08:28.922456136Z","trace_id":-2120733611673413870,"type":""},{"duration":701496138,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_7","parent_id":-1529159581253984277,"resource":"child_7","service":"trace-test-app","span_id":2564879243480693181,"start":"2022- 08-24T17:08:29.022658071Z","trace_id":-2120733611673413870,"type":""},{"duration":601291015,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_6","parent_id":2564879243480693181,"resource":"child_6","service":"trace-test-app","span_id":3923357355904970171,"start":"2022-08-24T17:08:29.122860246Z","tra ce_id":-2120733611673413870,"type":""},{"duration":501093972,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_5","parent_id":3923357355904970171,"resource":"child_5","service":"trace-test-app","span_id":1032045299762712103,"start":"2022-08-24T17:08:29.223054136Z","trace_id":-2120733611673413870,"ty pe":""},{"duration":400881571,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_4","parent_id":1032045299762712103,"resource":"child_4","service":"trace-test-app","span_id":2152095997608724519,"start":"2022-08-24T17:08:29.323262885Z","trace_id":-2120733611673413870,"type":""},{"duration":300684508," error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_3","parent_id":2152095997608724519,"resource":"child_3","service":"trace-test-app","span_id":-1785016045258492890,"start":"2022-08-24T17:08:29.423456109Z","trace_id":-2120733611673413870,"type":""},{"duration":200487939,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_2","parent_id":-1785016045258492890,"resource":"child_2","service":"trace-test-app","span_id":6657987248375815165,"start":"2022-08-24T17:08:29.523646774Z","trace_id":-2120733611673413870,"type":""},{"duration":100097826,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct" :{},"metrics":{},"name":"child_1","parent_id":6657987248375815165,"resource":"child_1","service":"trace-test-app","span_id":-416479733790656545,"start":"2022-08-24T17:08:29.623978831Z","trace_id":-2120733611673413870,"type":""},{"duration":1001783781,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child _10","parent_id":8013505727887431227,"resource":"child_10","service":"trace-test-app","span_id":4700929287043722318,"start":"2022-08-24T17:08:29.724181743Z","trace_id":-2120733611673413870,"type":""},{"duration":901693355,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_9","parent_id":4700929287043 722318,"resource":"child_9","service":"trace-test-app","span_id":7091980570785734600,"start":"2022-08-24T17:08:29.824269273Z","trace_id":-2120733611673413870,"type":""},{"duration":801558600,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_8","parent_id":7091980570785734600,"resource":"child_8","se rvice":"trace-test-app","span_id":-3441724011161494891,"start":"2022-08-24T17:08:29.924401068Z","trace_id":-2120733611673413870,"type":""},{"duration":701433174,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_7","parent_id":-3441724011161494891,"resource":"child_7","service":"trace-test-app","span _id":-4978423517783410612,"start":"2022-08-24T17:08:30.024523413Z","trace_id":-2120733611673413870,"type":""},{"duration":601224912,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_6","parent_id":-4978423517783410612,"resource":"child_6","service":"trace-test-app","span_id":-6058157343109761121,"st art":"2022-08-24T17:08:30.124728708Z","trace_id":-2120733611673413870,"type":""},{"duration":500996288,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_5","parent_id":-6058157343109761121,"resource":"child_5","service":"trace-test-app","span_id":-4014924905939389512,"start":"2022-08-24T17:08:30.224 954400Z","trace_id":-2120733611673413870,"type":""},{"duration":400766078,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_4","parent_id":-4014924905939389512,"resource":"child_4","service":"trace-test-app","span_id":-7259823233518481964,"start":"2022-08-24T17:08:30.325181315Z","trace_id":-21207336 11673413870,"type":""},{"duration":300571391,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_3","parent_id":-7259823233518481964,"resource":"child_3","service":"trace-test-app","span_id":-2411037428787107394,"start":"2022-08-24T17:08:30.425372457Z","trace_id":-2120733611673413870,"type":""},{"dura tion":200361771,"error":0,"meta":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_2","parent_id":-2411037428787107394,"resource":"child_2","service":"trace-test-app","span_id":1000856630072972936,"start":"2022-08-24T17:08:30.525575359Z","trace_id":-2120733611673413870,"type":""},{"duration":100166767,"error":0,"met a":{"env":"my-dev-env","foo":"bar"},"meta_struct":{},"metrics":{},"name":"child_1","parent_id":1000856630072972936,"resource":"child_1","service":"trace-test-app","span_id":-6545389230048719278,"start":"2022-08-24T17:08:30.625729113Z","trace_id":-2120733611673413870,"type":""}],"tags":{"_dd.tags.container":"pod_phase:running,kube_qos:BestEffo rt,kube_container_name:tracer,image_name:tracer,short_image:tracer,kube_ownerref_kind:job,kube_job:tracer,kube_namespace:default,image_tag:latest,image_id:docker://sha256:b82f19b98a77d3570162713e3d1e2909d66325ba5bfe01eff24cf6336dc0a969,docker_image:tracer:latest,pod_name:tracer-gpxcc,kube_ownerref_name:tracer,container_id:7415d81d7f4554f677ac989b85556b73e162c02fc9f6c8f7f339767acd8c2351,display_container_name:tracer_tracer-gpxcc,container_name:tracer"},"target_tps":0.0,"tracer_version":"1.4.1"}
- View all resources:
kubectl get -A all
- uninstall a project:
helm uninstall <vector/datadog-agent/...>