Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help configuring Distributed Tracing using env var #8535

Open
alvarolop opened this issue Jan 28, 2025 · 12 comments
Open

Help configuring Distributed Tracing using env var #8535

alvarolop opened this issue Jan 28, 2025 · 12 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@alvarolop
Copy link

Expected Behavior

I'm deploying Tekton on OpenShift using the operator and I'm trying to configure Distributed tracing for Tasks and Pipelines as especified in https://github.com/tektoncd/community/blob/main/teps/0124-distributed-tracing-for-tasks-and-pipelines.md

What I was expecting is that after configuring the parameter OTEL_EXPORTER_JAEGER_ENDPOINT in a pipeline, that would automatically trigger that the traces and spans of that pipeline would be sent to my OpenTelemetry Collector that was deployed also using the Operator.

Also, I'm not finding the actual documentation about this feature, which I think it is pretty cool! 😄

Actual Behavior

What I see in the logs of the pipelines controller and webhook is some references about KNative tracing, but this is not really doing anything I think.

{"severity":"error","timestamp":"2025-01-28T08:55:44.241Z","logger":"tekton-pipelines-webhook.ConversionWebhook","caller":"controller/controller.go:566","message":"Reconcile error","commit":"acb6211","knative.dev/traceid":"8b436021-453a-477b-a446-007ea979334c","knative.dev/key":"stepactions.tekton.dev","duration":0.000074122,"error":"custom resource \"stepactions.tekton.dev\" isn't configured for webhook conversion","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:491"}

Steps to Reproduce the Problem

  1. Create a Sleep Task to simulate some operation
---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: sleep-task
spec:
  params:
    - name: sleep-duration
      description: Duration to sleep (in seconds)
      type: string
      default: "1"
  steps:
    - name: sleep-step
      image: registry.access.redhat.com/ubi8/ubi-minimal
      script: |
        #!/bin/sh
        echo "Sleeping for $(params.sleep-duration) seconds..."
        sleep $(params.sleep-duration)
  1. Create a Pipeline with the Otel parameter:
---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: sleep-pipeline
spec:
  params:
    - name: step1-sleep-duration
      description: Sleep duration for step 1 (in seconds)
      type: string
      default: 1
    - name: step2-sleep-duration
      description: Sleep duration for step 2 (in seconds)
      type: string
      default: 1

    # https://github.com/tektoncd/pipeline/pull/5746
    - name: OTEL_EXPORTER_JAEGER_ENDPOINT
      default: "http://app-to-kafka-collector.otel.svc.cluster.local:4317/v1/traces"
      type: string
  tasks:
    - name: step1
      taskRef:
        name: sleep-task
      params:
        - name: sleep-duration
          value: $(params.step1-sleep-duration)

    - name: step2
      taskRef:
        name: sleep-task
      params:
        - name: sleep-duration
          value: $(params.step2-sleep-duration)
      runAfter:
        - step1

Additional Info

  • Kubernetes version:

    Output of kubectl version:

v1.29.11+ef2a55c
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

tkn version
Client version: 0.35.2
Chains version: v0.23.0
Pipeline version: v0.65.4
Triggers version: v0.30.0
Operator version: v0.74.0
Hub version: v1.19.0
@alvarolop alvarolop added the kind/bug Categorizes issue or PR as related to a bug. label Jan 28, 2025
@vdemeester
Copy link
Member

@alvarolop
Copy link
Author

Hello Vincent,

Thank you for your help!! I just edited the configMap, but the key looks suspicius _example. The value, I updated it to

################################
#                              #
#    EXAMPLE CONFIGURATION     #
#                              #
################################
# This block is not actually functional configuration,
# but serves to illustrate the available configuration
# options and document them in a way that is accessible
# to users that `kubectl edit` this config map.
#
# These sample configuration options may be copied out of
# this example block and unindented to be in the data block
# to actually change the configuration.
#
# Enable sending traces to defined endpoint by setting this to true
enabled: true
#
# API endpoint to send the traces to
# (optional): The default value is given below
endpoint: "http://app-to-kafka-collector.otel.svc.cluster.local:14268/api/traces"
# (optional) Name of the k8s secret which contains basic auth credentials
# credentialsSecret: "jaeger-creds"

But the result looks the same. I don't see any references to app-to-kafka-collector on the logs of the pipeline controller or webhook (even after restarting the pods) when I execute a pipeline

@vdemeester
Copy link
Member

@alvarolop yeah the field need to be not under _example. The reason for that _example field is to make sure it's not just about uncommenting it (kind-of 😛 )

It should look a bit like the following.

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-tracing
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/instance: default
    app.kubernetes.io/part-of: tekton-pipelines
data:
  enabled: "true"
  credentialsSecret: "jaeger-creds"

@alvarolop
Copy link
Author

Hahahha ok, it was a nice challenge test, but for me it was not clear how to configure it :)

So, now I see in the Tekton controller logs that it is using my OTEL endpoint, but it complains the following depending on which config I set:

http://app-to-kafka-collector.otel.svc.cluster.local:14268/api/traces: 400 Bad Request
http://app-to-kafka-collector.otel.svc.cluster.local:14268/v1/traces: 404 Not Found

Do you have any example of this configuration?

@afrittoli
Copy link
Member

The host is parsed and the various part are passed to otlptrace, which seems ok:

u, err := url.Parse(cfg.Endpoint)
if err != nil {
return nil, err
}
opts := []otlptracehttp.Option{
otlptracehttp.WithEndpoint(u.Host),
otlptracehttp.WithURLPath(u.Path),
}
if u.Scheme == "http" {
opts = append(opts, otlptracehttp.WithInsecure())
}
if user != "" && pass != "" {
creds := fmt.Sprintf("%s:%s", user, pass)
enc := base64.StdEncoding.EncodeToString([]byte(creds))
o := otlptracehttp.WithHeaders(map[string]string{
"Authorization": "Basic " + enc,
})
opts = append(opts, o)
}
ctx := context.Background()
exp, err := otlptracehttp.New(ctx, opts...)

Do you see the request reaching OTEL on your side? Any log there that may help?

@alvarolop
Copy link
Author

alvarolop commented Jan 28, 2025

oh, and this looks like the OTLP protocol? I saw the port which looks like thrift HTTP, so that is why I configured the OTLP like this

    receivers:
      # Enable Thrift receiver for Tekton traces
      jaeger:
        protocols:
          grpc:
            endpoint: 0.0.0.0:14250
          thrift_binary:
            endpoint: 0.0.0.0:6832 
          thrift_compact:
            endpoint: 0.0.0.0:6831
          thrift_http:
            endpoint: 0.0.0.0:14268

Do you know if it is possible to use OTLP? Or that is not yet supported as stated here? #7175

Anyway, no, I don't see any traces reaching the OTEL

@afrittoli
Copy link
Member

The work on enabling a gRPC endpoint was started here #7721 but never finished unfortunately. Would that have helped?

@alvarolop
Copy link
Author

For us, it would simplify the OpenTelemetry collector configuration, as the rest of our applications send traces in OTLP format. Less dependencies, less configuration and definitelly we think that OTLP is the way forward :)

Also, this is my first app in jaeger format, so maybe the collector is not correct and that's why it is failing

@alvarolop
Copy link
Author

Could you confirm that the protocol is thrift_http and that it is compatible with the latest version of Jaeger? Do you have any working example about this configuration? I'm a little bit lost here and I'm not finding any config example

@alvarolop
Copy link
Author

Just an update. I used the OTLP HTTP port and send it to the OTLP HTTP receiver in OpenTelemetry, and it worked. Thank you all for your help. I was expecting something else (Like clearly seeing all the traces and spans) but it works :)

Seeing traces and spans clearly in traces is possible without modifying the actual pipelines? Do you know what I mean?

@afrittoli
Copy link
Member

The reconciler code is instrumented so that you can see all function invocations in each reconciliation cycle.
Spans are set to match the TaskRuns and PipelineRuns lifecycles, so you should be able to see those, too.
Because of how k8s works, the Run of a resource comprises multiple reconciliation cycles, which makes the traces a bit harder to read.

Unfortunately, we lost our contributor who was implementing tracing in Tekton, so implementation stopped at the level you can see today. The initial plan included more features that are not yet implemented (or fully designed):

  • Add Pod events to the traces to enable users to see Pod scheduling, init and execution time. The design for this is available in TEP-0136
  • Make tracing level configurable: today, traces include a level of detail that may be interesting to a Tekton developer but probably not to the end user. We should add a config setting to enable/disable tracing of the reconciler internal functions
  • Add the ability for users to inject custom traces. This would be something in the API/Step code that lets users inject custom events into the traces. Design is needed for this.

If you're interested in contributing or know someone who might be, please let me know; I'd be happy to mentor new contributors and help make this happen.

@alvarolop
Copy link
Author

Wow, that would be amazing to see!! Sadly, I don't have enough knowledge to contribute to this.
Thank you very much for your help and I will keep an eye on this as I love the Tekton project and also observability!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Todo
Development

No branches or pull requests

3 participants