Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define prometheus metrics to k8s #1021

Merged
merged 48 commits into from
Jan 15, 2025
Merged

Conversation

HaoYang0000
Copy link
Collaborator

@HaoYang0000 HaoYang0000 commented Jan 8, 2025

This PR add custom metrics when deploy Prometheus adapter.

Copy link
Collaborator

@roypaulin roypaulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of those PromQL queries need to be tested on prometheus to make sure they represent what the user wants. You can easily set up Prometheus. You also need to have a basic understanding of prometheus metric types(counter, gauge...) as well as functions like increase, sum, et... and when to use them.

# name:
# matches: "^vertica_sessions_running_counter$"
# as: "vertica_sessions_running_counter"
metricsQuery: 'sum(increase(vertica_sessions_running_counter[60m])) by (namespace, pod)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you played around with these queries on prometheus to check if they make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I ran the Vertica API to get the value and compared it with the Prometheus API result. For example, percentage value, like CPU/memory usage in an average of time, it gives for example 10174m means 10.174% per hour in average, using the avg_over_time function. I left the example in the code.

Ref: https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/walkthrough.md#quantity-values

# name:
# matches: "^vertica_cpu_aggregate_usage_percentage$"
# as: "vertica_cpu_aggregate_usage_percentage" # If rename needed
metricsQuery: 'avg_over_time(vertica_cpu_aggregate_usage_percentage[60m])' # 10174m means 10.174% per hour in average Ref: https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/walkthrough.md#quantity-values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each of these queries we need a detailed description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, but should we do the description in the developer doc, instead of here in adapter.yaml?
Do we expect the user to use the metrics we provided, or we provide example here and expect they can customize on their own?

@roypaulin
Copy link
Collaborator

Did you discuss with Cai about these ?

prometheus/adapter.yaml Outdated Show resolved Hide resolved
prometheus/adapter.yaml Outdated Show resolved Hide resolved
@HaoYang0000 HaoYang0000 marked this pull request as ready for review January 13, 2025 07:54
Copy link
Collaborator

@LiboYu2 LiboYu2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error from a local run of the test case:
logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole "prometheus-kube-prometheus-operator" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "kuttl-test-talented-mongrel": current value is "prometheus"
logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | make: *** [Makefile:667: deploy-prometheus] Error 1
case.go:399: failed in step 10-deploy-prometheus
case.go:401: command "cd ../../.. && make deploy-prometheus PROMETHEUS_NAMESPACE=$NAMESPACE" failed, exit status 2

We should not create extra namespace. All test related resources must be installed in kuttl namespace and when the test is finished the namespace will be deleted. It may not be possible to use a single namespace. We can create extra ones but with names that will not easily duplicate any existing ones. After the test case is ready, the extra namespaces can be deleted.

@roypaulin
Copy link
Collaborator

A thing I think would be interesting is to show an example of VerticaAutoscaler where the metrics are set, so it can be a reference, and we know how to properly use it. It can be a file in prometheus folder that contains an example for each.

@LiboYu2
Copy link
Collaborator

LiboYu2 commented Jan 15, 2025

A thing I think would be interesting is to show an example of VerticaAutoscaler where the metrics are set, so it can be a reference, and we know how to properly use it. It can be a file in prometheus folder that contains an example for each.

There is one in config./sample. We can add another one for custom metric.

@HaoYang0000
Copy link
Collaborator Author

HaoYang0000 commented Jan 15, 2025

This is an error from a local run of the test case: logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole "prometheus-kube-prometheus-operator" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "kuttl-test-talented-mongrel": current value is "prometheus" logger.go:42: 16:50:45 | prometheus-sanity/10-deploy-prometheus | make: *** [Makefile:667: deploy-prometheus] Error 1 case.go:399: failed in step 10-deploy-prometheus case.go:401: command "cd ../../.. && make deploy-prometheus PROMETHEUS_NAMESPACE=$NAMESPACE" failed, exit status 2

We should not create extra namespace. All test related resources must be installed in kuttl namespace and when the test is finished the namespace will be deleted. It may not be possible to use a single namespace. We can create extra ones but with names that will not easily duplicate any existing ones. After the test case is ready, the extra namespaces can be deleted.

The current Prometheus integration test used the kuttl namespace, we didn't use extra namespace there.
This error happens because you installed Prometheus in your local env, and kuttl shares the same env which Prometheus has already been installed. To run the test, you will need to undeploy Prometheus in your local env.
Sometimes the test will fail on the local env and the Prometheus resources will be leftover, you can use the following cmd as reference to clean up the resources:

NAMESPACE=$1
kubectl delete clusterrole prometheus-kube-prometheus-operator -n $NAMESPACE
kubectl delete clusterrole prometheus-kube-prometheus-prometheus -n $NAMESPACE
kubectl delete clusterrolebinding prometheus-kube-prometheus-operator -n $NAMESPACE
kubectl delete clusterrolebinding prometheus-kube-prometheus-prometheus -n $NAMESPACE
kubectl delete svc prometheus-kube-prometheus-kube-proxy -n kube-system
kubectl delete svc prometheus-kube-prometheus-kubelet -n kube-system
kubectl delete MutatingWebhookConfiguration prometheus-kube-prometheus-admission -n $NAMESPACE
kubectl delete ValidatingWebhookConfiguration prometheus-kube-prometheus-admission -n $NAMESPACE

@HaoYang0000
Copy link
Collaborator Author

@roypaulin @LiboYu2 I added an example for the autoscaler with custom metrics in another PR: #1033

@HaoYang0000 HaoYang0000 merged commit 0854860 into main Jan 15, 2025
40 of 41 checks passed
@HaoYang0000 HaoYang0000 deleted the define-prometheus-metrics-to-k8s branch January 15, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants