We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am trying to scrape the Spark Operator metrics from the metrics endpoint, but I am not seeing any of the metrics that are listed on this page when I visit 8080/metrics: https://www.kubeflow.org/docs/components/spark-operator/getting-started/#enable-metric-exporting-to-prometheus
My controller logs look like this (note that metrics are enabled and that there are no messages indicating that the metrics failed to be registered):
++ id -u + uid=185 ++ id -g + gid=185 + set +e ++ getent passwd 185 + uidentry=spark:x:185:185::/home/spark:/bin/sh + set -e + [[ -z spark:x:185:185::/home/spark:/bin/sh ]] + exec /usr/bin/tini -s -- /usr/bin/spark-operator controller start --zap-log-level=info '--namespaces=""' --controller-threads=10 --enable-ui-service=true --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-dev-controller-lock --leader-election-lock-namespace=spark-operator --workqueue-ratelimiter-bucket-qps=50 --workqueue-ratelimiter-bucket-size=500 --workqueue-ratelimiter-max-delay=6h Spark Operator Version: 2.0.2+HEAD+unknown Build Date: 2024-10-11T01:46:23+00:00 Git Commit ID: Git Tree State: clean Go Version: go1.23.1 Compiler: gc Platform: linux/amd64 2025-01-17T21:48:56.434Z [34mINFO[0m controller/start.go:298 Starting manager 2025-01-17T21:48:56.434Z [34mINFO[0m controller-runtime.metrics server/server.go:205 Starting metrics server 2025-01-17T21:48:56.434Z [34mINFO[0m manager/server.go:50 starting server {"kind": "health probe", "addr": "[::]:8081"} 2025-01-17T21:48:56.434Z [34mINFO[0m controller-runtime.metrics server/server.go:244 Serving metrics server {"bindAddress": ":8080", "secure": false} I0117 21:48:56.434810 10 leaderelection.go:250] attempting to acquire leader lease spark-operator/spark-operator-dev-controller-lock... I0117 21:49:16.255021 10 leaderelection.go:260] successfully acquired lease spark-operator/spark-operator-dev-controller-lock 2025-01-17T21:49:16.255Z [34mINFO[0m controller/controller.go:178 Starting EventSource {"controller": "spark-application-controller", "source": "kind source: *v1.Pod"} 2025-01-17T21:49:16.255Z [34mINFO[0m controller/controller.go:178 Starting EventSource {"controller": "spark-application-controller", "source": "kind source: *v1beta2.SparkApplication"} 2025-01-17T21:49:16.255Z [34mINFO[0m controller/controller.go:186 Starting Controller {"controller": "spark-application-controller"} 2025-01-17T21:49:16.255Z [34mINFO[0m controller/controller.go:178 Starting EventSource {"controller": "scheduled-spark-application-controller", "source": "kind source: *v1beta2.ScheduledSparkApplication"} 2025-01-17T21:49:16.255Z [34mINFO[0m controller/controller.go:186 Starting Controller {"controller": "scheduled-spark-application-controller"} 2025-01-17T21:49:16.356Z [34mINFO[0m controller/controller.go:220 Starting workers {"controller": "spark-application-controller", "worker count": 10} 2025-01-17T21:49:16.356Z [34mINFO[0m controller/controller.go:220 Starting workers {"controller": "scheduled-spark-application-controller", "worker count": 10}
Anyone have any idea why these metrics might be missing? I can see other metrics like controller_runtime_active_workers and workqueue_adds_total.
controller_runtime_active_workers
workqueue_adds_total
Set up Spark Operator, port-forward metrics port to local machine, and visit localhost:8080/metrics to see all Prometheus metrics that are exposed.
localhost:8080/metrics
Should be able to see metrics defined in aforementioned page.
Cannot see those metrics.
No response
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered:
No branches or pull requests
What happened?
I am trying to scrape the Spark Operator metrics from the metrics endpoint, but I am not seeing any of the metrics that are listed on this page when I visit 8080/metrics: https://www.kubeflow.org/docs/components/spark-operator/getting-started/#enable-metric-exporting-to-prometheus
My controller logs look like this (note that metrics are enabled and that there are no messages indicating that the metrics failed to be registered):
Anyone have any idea why these metrics might be missing? I can see other metrics like
controller_runtime_active_workers
andworkqueue_adds_total
.Reproduction Code
Set up Spark Operator, port-forward metrics port to local machine, and visit
localhost:8080/metrics
to see all Prometheus metrics that are exposed.Expected behavior
Should be able to see metrics defined in aforementioned page.
Actual behavior
Cannot see those metrics.
Environment & Versions
Additional context
No response
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered: