Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-72434] Add metrics for failure causes on builds #176

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yachub
Copy link

@yachub yachub commented Dec 18, 2023

Fixes https://issues.jenkins.io/browse/JENKINS-72434

This adds a new jenkins_bfa* metric for failure causes found in a specific job build in the convention of jenkins_bfa.job.@<JOB_NAME>@.number.@<JOB_BUILD_NUMBER>@.cause.@<FAILURE_CAUSE_NAME>.

Used in conjunction with the prometheus metrics plugin, and a few additional metric_relabel_configs, this results in a metric like:

  • Found cause: jenkins_bfa{build_number="14", cause="job_exits_1", instance="host.docker.internal:8080", jenkins_job="jake_test_job", job="jenkins"}
  • No matching cause: jenkins_bfa{build_number="15", cause="no_matching_cause", instance="host.docker.internal:8080", jenkins_job="jake_test_job", job="jenkins"}

This allows jenkins_bfa metrics to be joined to other metrics such as default_jenkins_builds_build_result_ordinal via common labels jenkins_job and build_number/number.

I would have implemented this a bit differently, but it seems that unfortunately metrics-core does not support "tags" (What prometheus refers to as labels), hence the need to relabel the prometheus metrics. My understanding based on discussions in https://github.com/dropwizard/metrics is that tags will be supported in 5.x, which is not officially released yet anyways.

Testing done

The above mentioned metrics and screenshots below were gathered via:

  1. Using mvn -DskipTests clean hpi:run in my Eclipse Run Configuration, as well as docker run --rm -p 9090:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus (prometheus.yml pasted below, and diff or pom.xml to add 2 dependencies in order to pull in the prometheus plugin for testing prometheus
    metrics)

    • pom.xml
      diff --git a/pom.xml b/pom.xml
      index e4539ed..66f0fa8 100644
      --- a/pom.xml
      +++ b/pom.xml
      @@ -279,6 +279,16 @@
                   <artifactId>metrics</artifactId>
                   <version>4.2.18-439.v86a_20b_a_8318b_</version>
               </dependency>
      +        <dependency>
      +            <groupId>org.jenkins-ci.plugins</groupId>
      +            <artifactId>prometheus</artifactId>
      +            <version>2.2.3</version>
      +        </dependency>
      +        <dependency>
      +            <groupId>org.slf4j</groupId>
      +            <artifactId>slf4j-api</artifactId>
      +            <version>2.0.7</version>
      +        </dependency>
           </dependencies>
           <dependencyManagement>
               <dependencies>
    • prometheus.yml
      global:
        scrape_interval:     15s
        evaluation_interval: 15s
      
      scrape_configs:
        - job_name: jenkins
          metrics_path: /jenkins/prometheus/
          static_configs:
            - targets: ['host.docker.internal:8080']
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: 'jenkins_bfa_category_(.*)'
              target_label: 'category'
            - source_labels: [__name__]
              regex: 'jenkins_bfa_cause_(.*)'
              target_label: 'cause'
            - source_labels: [__name__]
              regex: 'jenkins_bfa_job__(.*)__number__(.*)__cause__(.*)'
              replacement: '$1'
              target_label: 'jenkins_job'
            - source_labels: [__name__]
              regex: 'jenkins_bfa_job__(.*)__number__(.*)__cause__(.*)'
              replacement: '$2'
              target_label: 'build_number'
            - source_labels: [__name__]
              regex: 'jenkins_bfa_job__(.*)__number__(.*)__cause__(.*)'
              replacement: '$3'
              target_label: 'cause'
            - source_labels: [__name__]
              regex: 'jenkins_bfa_(.*)'
              replacement: 'jenkins_bfa'
              target_label: __name__
  2. Create a freestyle job that simply set +x & exit 1 to generate metrics for build failures with no matching causes.

  3. Create a new failure cause with an indication matching the "exit 1" in the log to generate metrics for build failures with the matching cause.

Prometheus screenshot below:

Screenshot 2023-12-18 at 13-45-06 Prometheus Time Series Collection and Processing Server

Submitter checklist

Preview Give feedback

@yachub yachub marked this pull request as ready for review December 18, 2023 20:25
@yachub yachub changed the title Add metrics for failure causes on builds (JENKINS-72434) Add metrics for failure causes on builds Dec 18, 2023
@yachub
Copy link
Author

yachub commented Dec 20, 2023

Oops, I just realized that matrix job names are not correct, let me look into that.

@yachub yachub force-pushed the feature/add_build_failure_cause_metrics branch from cfbbe7c to 5f3d3d5 Compare December 20, 2023 14:20
@yachub
Copy link
Author

yachub commented Dec 20, 2023

Oops, I just realized that matrix job names are not correct, let me look into that.

Not sure how happy I am with this since the jenkins_job label for matrix jobs ends up getting transformed (Replacing /, ,, and = with _) by metrics-core. Example of a 4 cell matrix job:

image

Also rebased my branch onto master.

@yachub yachub marked this pull request as draft December 21, 2023 16:25
@yachub yachub changed the title (JENKINS-72434) Add metrics for failure causes on builds [JENKINS-72434] Add metrics for failure causes on builds Dec 21, 2023
@Waschndolos
Copy link

Waschndolos commented May 13, 2024

I think having the causes would be nice, is this PR is still alive? :)

@yachub
Copy link
Author

yachub commented May 15, 2024

I think having the causes would be nice, is this PR is still alive? :)

Agreed! Well, it turns out that I no longer use Jenkins on a daily basis, so unfortunately have not had the time to dedicate to revisiting it. IIRC due to the way metrics-core transforms them, I think you’d have to get real creative with prometheus relabeling when it comes to matrix jobs. The other idea off the top of my head was to completely rework this not to use metrics-core, but instead export different types of metrics natively, which would be configurable for BFA in the Jenkins Global Configuration UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants