Define prometheus metrics to k8s #1021

HaoYang0000 · 2025-01-08T13:27:52Z

This PR add custom metrics when deploy Prometheus adapter.

Co-authored-by: Roy Paulin <[email protected]>

roypaulin

Each of those PromQL queries need to be tested on prometheus to make sure they represent what the user wants. You can easily set up Prometheus. You also need to have a basic understanding of prometheus metric types(counter, gauge...) as well as functions like increase, sum, et... and when to use them.

roypaulin · 2025-01-09T08:16:20Z

prometheus/adapter.yaml

+      # name:
+      #   matches: "^vertica_sessions_running_counter$"
+      #   as: "vertica_sessions_running_counter"
+      metricsQuery: 'sum(increase(vertica_sessions_running_counter[60m])) by (namespace, pod)'


Have you played around with these queries on prometheus to check if they make sense?

Yeah, I ran the Vertica API to get the value and compared it with the Prometheus API result. For example, percentage value, like CPU/memory usage in an average of time, it gives for example 10174m means 10.174% per hour in average, using the avg_over_time function. I left the example in the code.

Ref: https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/walkthrough.md#quantity-values

roypaulin · 2025-01-09T08:17:06Z

prometheus/adapter.yaml

+      # name:
+      #   matches: "^vertica_cpu_aggregate_usage_percentage$"
+      #   as: "vertica_cpu_aggregate_usage_percentage" # If rename needed
+      metricsQuery: 'avg_over_time(vertica_cpu_aggregate_usage_percentage[60m])' # 10174m means 10.174% per hour in average Ref: https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/walkthrough.md#quantity-values


For each of these queries we need a detailed description.

For sure, but should we do the description in the developer doc, instead of here in adapter.yaml?
Do we expect the user to use the metrics we provided, or we provide example here and expect they can customize on their own?

roypaulin · 2025-01-09T08:20:48Z

Did you discuss with Cai about these ?

prometheus/adapter.yaml

tests/e2e-leg-11/prometheus-sanity/30-assert.yaml

LiboYu2

This is an error from a local run of the test case:
logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole "prometheus-kube-prometheus-operator" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "kuttl-test-talented-mongrel": current value is "prometheus"
logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | make: *** [Makefile:667: deploy-prometheus] Error 1
case.go:399: failed in step 10-deploy-prometheus
case.go:401: command "cd ../../.. && make deploy-prometheus PROMETHEUS_NAMESPACE=$NAMESPACE" failed, exit status 2

We should not create extra namespace. All test related resources must be installed in kuttl namespace and when the test is finished the namespace will be deleted. It may not be possible to use a single namespace. We can create extra ones but with names that will not easily duplicate any existing ones. After the test case is ready, the extra namespaces can be deleted.

roypaulin · 2025-01-15T00:19:58Z

A thing I think would be interesting is to show an example of VerticaAutoscaler where the metrics are set, so it can be a reference, and we know how to properly use it. It can be a file in prometheus folder that contains an example for each.

LiboYu2 · 2025-01-15T01:30:25Z

A thing I think would be interesting is to show an example of VerticaAutoscaler where the metrics are set, so it can be a reference, and we know how to properly use it. It can be a file in prometheus folder that contains an example for each.

There is one in config./sample. We can add another one for custom metric.

HaoYang0000 · 2025-01-15T08:15:09Z

This is an error from a local run of the test case: logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole "prometheus-kube-prometheus-operator" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "kuttl-test-talented-mongrel": current value is "prometheus" logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | make: *** [Makefile:667: deploy-prometheus] Error 1 case.go:399: failed in step 10-deploy-prometheus case.go:401: command "cd ../../.. && make deploy-prometheus PROMETHEUS_NAMESPACE=$NAMESPACE" failed, exit status 2

We should not create extra namespace. All test related resources must be installed in kuttl namespace and when the test is finished the namespace will be deleted. It may not be possible to use a single namespace. We can create extra ones but with names that will not easily duplicate any existing ones. After the test case is ready, the extra namespaces can be deleted.

The current Prometheus integration test used the kuttl namespace, we didn't use extra namespace there.
This error happens because you installed Prometheus in your local env, and kuttl shares the same env which Prometheus has already been installed. To run the test, you will need to undeploy Prometheus in your local env.
Sometimes the test will fail on the local env and the Prometheus resources will be leftover, you can use the following cmd as reference to clean up the resources:

NAMESPACE=$1
kubectl delete clusterrole prometheus-kube-prometheus-operator -n $NAMESPACE
kubectl delete clusterrole prometheus-kube-prometheus-prometheus -n $NAMESPACE
kubectl delete clusterrolebinding prometheus-kube-prometheus-operator -n $NAMESPACE
kubectl delete clusterrolebinding prometheus-kube-prometheus-prometheus -n $NAMESPACE
kubectl delete svc prometheus-kube-prometheus-kube-proxy -n kube-system
kubectl delete svc prometheus-kube-prometheus-kubelet -n kube-system
kubectl delete MutatingWebhookConfiguration prometheus-kube-prometheus-admission -n $NAMESPACE
kubectl delete ValidatingWebhookConfiguration prometheus-kube-prometheus-admission -n $NAMESPACE

HaoYang0000 · 2025-01-15T08:53:44Z

@roypaulin @LiboYu2 I added an example for the autoscaler with custom metrics in another PR: #1033

Addressed

qindotguan and others added 30 commits December 23, 2024 02:13

add setup-prometheus-adapter

c2d7603

sample servicemonitor and values

b9e2bc9

update make file

f39bfb2

update sample service monitor

1c25379

update example values

6cf5e69

create script

dd38f92

update make file

b8b2110

remove template file

1682e71

remove unused content

5c3da03

address comments

4cd1754

add interval in scrpits and fix bug in namespace

593441a

add namespace in undeploy

b83799f

add vdb namespace customize

5d8c464

revert change

b0b6b88

revert make file changes

aabad14

update default values

1006b9a

move values file to prometheus directory

e878ad6

update file path

5e260c8

add e2e test

825cbbc

address comments

d1a9398

Update Makefile

bbbe46b

Co-authored-by: Roy Paulin <[email protected]>

update e2e test

cabbde9

rename dbname to vdbname in script

1e4d33b

update script with cmd instead of cat f

dd194a6

add label for secret

4e9ca4e

assert service as well in e2e check

9aa2428

add cr deletetion and fix namespace issue

cfc65ca

fix namespace issue

1910af9

fix _

da6aea4

add undeploy-prometheus-adapter

22e4958

qindotguan and others added 3 commits January 6, 2025 14:15

add PROMETHEUS_ADAPTER_HELM_OVERRIDES that we can use to set more fields

ef8d289

add prometheus adapter test

b36b822

draft

54fd909

roypaulin reviewed Jan 9, 2025

View reviewed changes

prometheus/adapter.yaml Outdated Show resolved Hide resolved

roypaulin reviewed Jan 9, 2025

View reviewed changes

prometheus/adapter.yaml Outdated Show resolved Hide resolved

HaoYang0000 added 2 commits January 10, 2025 07:26

Merge branch 'main' into define-prometheus-metrics-to-k8s

f09b5af

change time range for 1 min

b968fc2

qindotguan reviewed Jan 10, 2025

View reviewed changes

tests/e2e-leg-11/prometheus-sanity/30-assert.yaml Outdated Show resolved Hide resolved

HaoYang0000 added 4 commits January 13, 2025 07:38

remove notes from code

4e2c141

Merge branch 'main' into define-prometheus-metrics-to-k8s

afcecdf

remove assert

d2b35d7

add integration test

9b35e24

HaoYang0000 marked this pull request as ready for review January 13, 2025 07:54

HaoYang0000 requested review from cchen-vertica, fenic-fawkes and LiboYu2 as code owners January 13, 2025 07:54

HaoYang0000 added 3 commits January 13, 2025 12:15

Merge branch 'main' into define-prometheus-metrics-to-k8s

a498eba

Add description

a62288b

comment out changes

a6befdb

roypaulin approved these changes Jan 14, 2025

View reviewed changes

LiboYu2 previously requested changes Jan 14, 2025

View reviewed changes

HaoYang0000 merged commit 0854860 into main Jan 15, 2025
40 of 41 checks passed

HaoYang0000 deleted the define-prometheus-metrics-to-k8s branch January 15, 2025 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define prometheus metrics to k8s #1021

Define prometheus metrics to k8s #1021

HaoYang0000 commented Jan 8, 2025 •

edited

Loading

roypaulin left a comment

roypaulin Jan 9, 2025

HaoYang0000 Jan 9, 2025

roypaulin Jan 9, 2025

HaoYang0000 Jan 9, 2025

roypaulin commented Jan 9, 2025

LiboYu2 left a comment •

edited

Loading

roypaulin commented Jan 15, 2025

LiboYu2 commented Jan 15, 2025

HaoYang0000 commented Jan 15, 2025 •

edited

Loading

HaoYang0000 commented Jan 15, 2025

Define prometheus metrics to k8s #1021

Define prometheus metrics to k8s #1021

Conversation

HaoYang0000 commented Jan 8, 2025 • edited Loading

roypaulin left a comment

Choose a reason for hiding this comment

roypaulin Jan 9, 2025

Choose a reason for hiding this comment

HaoYang0000 Jan 9, 2025

Choose a reason for hiding this comment

roypaulin Jan 9, 2025

Choose a reason for hiding this comment

HaoYang0000 Jan 9, 2025

Choose a reason for hiding this comment

roypaulin commented Jan 9, 2025

LiboYu2 left a comment • edited Loading

Choose a reason for hiding this comment

roypaulin commented Jan 15, 2025

LiboYu2 commented Jan 15, 2025

HaoYang0000 commented Jan 15, 2025 • edited Loading

HaoYang0000 commented Jan 15, 2025

HaoYang0000 commented Jan 8, 2025 •

edited

Loading

LiboYu2 left a comment •

edited

Loading

HaoYang0000 commented Jan 15, 2025 •

edited

Loading