-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-72434] Add metrics for failure causes on builds #176
base: master
Are you sure you want to change the base?
[JENKINS-72434] Add metrics for failure causes on builds #176
Conversation
Oops, I just realized that matrix job names are not correct, let me look into that. |
cfbbe7c
to
5f3d3d5
Compare
5f3d3d5
to
270b865
Compare
I think having the causes would be nice, is this PR is still alive? :) |
Agreed! Well, it turns out that I no longer use Jenkins on a daily basis, so unfortunately have not had the time to dedicate to revisiting it. IIRC due to the way metrics-core transforms them, I think you’d have to get real creative with prometheus relabeling when it comes to matrix jobs. The other idea off the top of my head was to completely rework this not to use metrics-core, but instead export different types of metrics natively, which would be configurable for BFA in the Jenkins Global Configuration UI. |
Fixes https://issues.jenkins.io/browse/JENKINS-72434
This adds a new
jenkins_bfa*
metric for failure causes found in a specific job build in the convention ofjenkins_bfa.job.@<JOB_NAME>@.number.@<JOB_BUILD_NUMBER>@.cause.@<FAILURE_CAUSE_NAME>
.Used in conjunction with the prometheus metrics plugin, and a few additional
metric_relabel_configs
, this results in a metric like:jenkins_bfa{build_number="14", cause="job_exits_1", instance="host.docker.internal:8080", jenkins_job="jake_test_job", job="jenkins"}
jenkins_bfa{build_number="15", cause="no_matching_cause", instance="host.docker.internal:8080", jenkins_job="jake_test_job", job="jenkins"}
This allows
jenkins_bfa
metrics to be joined to other metrics such asdefault_jenkins_builds_build_result_ordinal
via common labelsjenkins_job
andbuild_number
/number
.I would have implemented this a bit differently, but it seems that unfortunately metrics-core does not support "tags" (What prometheus refers to as labels), hence the need to relabel the prometheus metrics. My understanding based on discussions in https://github.com/dropwizard/metrics is that tags will be supported in 5.x, which is not officially released yet anyways.
Testing done
The above mentioned metrics and screenshots below were gathered via:
Using
mvn -DskipTests clean hpi:run
in my Eclipse Run Configuration, as well asdocker run --rm -p 9090:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
(prometheus.yml pasted below, and diff or pom.xml to add 2 dependencies in order to pull in the prometheus plugin for testing prometheusmetrics)
pom.xml
prometheus.yml
Create a freestyle job that simply
set +x
&exit 1
to generate metrics for build failures with no matching causes.Create a new failure cause with an indication matching the "exit 1" in the log to generate metrics for build failures with the matching cause.
Prometheus screenshot below:
Submitter checklist