Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define prometheus metrics to k8s #1021

Merged
merged 48 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
c2d7603
add setup-prometheus-adapter
qindotguan Dec 23, 2024
b9e2bc9
sample servicemonitor and values
HaoYang0000 Dec 24, 2024
f39bfb2
update make file
HaoYang0000 Dec 24, 2024
1c25379
update sample service monitor
HaoYang0000 Dec 25, 2024
6cf5e69
update example values
HaoYang0000 Dec 25, 2024
dd38f92
create script
HaoYang0000 Dec 26, 2024
b8b2110
update make file
HaoYang0000 Dec 26, 2024
1682e71
remove template file
HaoYang0000 Dec 26, 2024
5c3da03
remove unused content
HaoYang0000 Dec 26, 2024
4cd1754
address comments
HaoYang0000 Dec 26, 2024
593441a
add interval in scrpits and fix bug in namespace
HaoYang0000 Dec 26, 2024
b83799f
add namespace in undeploy
HaoYang0000 Dec 26, 2024
5d8c464
add vdb namespace customize
HaoYang0000 Dec 26, 2024
b0b6b88
revert change
HaoYang0000 Dec 26, 2024
aabad14
revert make file changes
HaoYang0000 Jan 2, 2025
1006b9a
update default values
HaoYang0000 Jan 2, 2025
e878ad6
move values file to prometheus directory
HaoYang0000 Jan 2, 2025
5e260c8
update file path
HaoYang0000 Jan 2, 2025
825cbbc
add e2e test
HaoYang0000 Jan 2, 2025
d1a9398
address comments
HaoYang0000 Jan 2, 2025
bbbe46b
Update Makefile
HaoYang0000 Jan 3, 2025
cabbde9
update e2e test
HaoYang0000 Jan 3, 2025
1e4d33b
rename dbname to vdbname in script
HaoYang0000 Jan 3, 2025
dd194a6
update script with cmd instead of cat f
HaoYang0000 Jan 3, 2025
4e9ca4e
add label for secret
HaoYang0000 Jan 3, 2025
9aa2428
assert service as well in e2e check
HaoYang0000 Jan 3, 2025
cfc65ca
add cr deletetion and fix namespace issue
HaoYang0000 Jan 3, 2025
1910af9
fix namespace issue
HaoYang0000 Jan 3, 2025
da6aea4
fix _
HaoYang0000 Jan 3, 2025
22e4958
add undeploy-prometheus-adapter
qindotguan Jan 3, 2025
e53300b
Merge branch 'deploy-prometheus' into qguan/setup-prometheus-adapter
qindotguan Jan 6, 2025
57ae07c
deploy with configuration file adapter.yaml
qindotguan Jan 6, 2025
7df6b27
correct the tabs
qindotguan Jan 6, 2025
22c2a63
Merge branch 'main' into qguan/setup-prometheus-adapter
qindotguan Jan 6, 2025
9756feb
add prometheus-adapter back after merge
qindotguan Jan 6, 2025
a2c953a
remove prometheustest as it renamed
qindotguan Jan 6, 2025
ef8d289
add PROMETHEUS_ADAPTER_HELM_OVERRIDES that we can use to set more fields
qindotguan Jan 6, 2025
b36b822
add prometheus adapter test
qindotguan Jan 8, 2025
54fd909
draft
HaoYang0000 Jan 8, 2025
f09b5af
Merge branch 'main' into define-prometheus-metrics-to-k8s
HaoYang0000 Jan 10, 2025
b968fc2
change time range for 1 min
HaoYang0000 Jan 10, 2025
4e2c141
remove notes from code
HaoYang0000 Jan 13, 2025
afcecdf
Merge branch 'main' into define-prometheus-metrics-to-k8s
HaoYang0000 Jan 13, 2025
d2b35d7
remove assert
HaoYang0000 Jan 13, 2025
9b35e24
add integration test
HaoYang0000 Jan 13, 2025
a498eba
Merge branch 'main' into define-prometheus-metrics-to-k8s
HaoYang0000 Jan 13, 2025
a62288b
Add description
HaoYang0000 Jan 13, 2025
a6befdb
comment out changes
HaoYang0000 Jan 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 26 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -142,22 +142,33 @@ OLM_CATALOG_IMG ?= olm-catalog:$(TAG)
endif
export OLM_CATALOG_IMG

# Name of the namespace to deploy prometheus
PROMETHEUS_NAMESPACE?=prometheus
# Prometheus variables that we wil be used for deployment
PROMETHEUS_HELM_NAME?=prometheus
PROMETHEUS_INTERVAL?=5s
# The Prometheus adapter name and namespace used in VerticaAutoscaler
PROMETHEUS_ADAPTER_NAME ?= prometheus-adapter
PROMETHEUS_ADAPTER_NAMESPACE ?= prometheus-adapter
PROMETHEUS_ADAPTER_REPLICAS ?= 1
# The Prometheus service URL and port for Prometheus adapter to connect to
PROMETHEUS_URL ?= http://$(PROMETHEUS_HELM_NAME)-kube-prometheus-prometheus.$(PROMETHEUS_NAMESPACE).svc
PROMETHEUS_PORT ?= 9090

# Set this to YES if you want to create a vertica image of minimal size
MINIMAL_VERTICA_IMG ?=
# Name of the helm release that we will install/uninstall
HELM_RELEASE_NAME?=vdb-op
# Prometheus variables that we wil be used for deployment
PROMETHEUS_HELM_NAME?=prometheus
PROMETHEUS_INTERVAL?=5s
DB_USER?=dbadmin
DB_PASSWORD?=
VDB_NAME?=verticadb-sample
VDB_NAMESPACE?=default
# Can be used to specify additional overrides when doing the helm install.
# For example to specify a custom webhook tls cert when deploying use this command:
# HELM_OVERRIDES="--set webhook.tlsSecret=custom-cert" make deploy-operator
HELM_OVERRIDES?=
PROMETHEUS_HELM_OVERRIDES?=
HELM_OVERRIDES ?=
PROMETHEUS_HELM_OVERRIDES ?=
PROMETHEUS_ADAPTER_HELM_OVERRIDES ?=
# Maximum number of tests to run at once. (default 2)
# Set it to any value not greater than 8 to override the default one
E2E_PARALLELISM?=2
Expand Down Expand Up @@ -254,8 +265,6 @@ DEPLOY_WAIT?=--wait
OLM_TEST_CATALOG_SOURCE=e2e-test-catalog
# Name of the namespace to deploy the operator in
NAMESPACE?=verticadb-operator
# Name of the namespace to deploy prometheus
PROMETHEUS_NAMESPACE?=prometheus

# The Go version that we will build the operator with
GO_VERSION?=1.23.2
Expand Down Expand Up @@ -677,6 +686,16 @@ undeploy-prometheus-service-monitor:
undeploy-prometheus-service-monitor-by-release:
scripts/deploy-prometheus.sh -l $(PROMETHEUS_HELM_NAME) -a undeploy_by_release

.PHONY: deploy-prometheus-adapter
deploy-prometheus-adapter: ## Setup prometheus adapter for VerticaAutoscaler
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install $(DEPLOY_WAIT) -n $(PROMETHEUS_ADAPTER_NAMESPACE) --create-namespace $(PROMETHEUS_ADAPTER_NAME) prometheus-community/prometheus-adapter --values prometheus/adapter.yaml --set prometheus.url=$(PROMETHEUS_URL) --set prometheus.port=$(PROMETHEUS_PORT) --set replicas=$(PROMETHEUS_ADAPTER_REPLICAS) $(PROMETHEUS_ADAPTER_HELM_OVERRIDES)

.PHONY: undeploy-prometheus-adapter
undeploy-prometheus-adapter: ## Remove prometheus adapter
helm uninstall $(PROMETHEUS_ADAPTER_NAME) -n $(PROMETHEUS_ADAPTER_NAMESPACE)

.PHONY: undeploy-operator
undeploy-operator: ## Undeploy operator that was previously deployed
scripts/undeploy.sh $(if $(filter false,$(ignore-not-found)),,-i)
Expand Down
73 changes: 73 additions & 0 deletions prometheus/adapter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Customize the adapter configuration to map Prometheus metrics to Kubernetes metrics
rules:
default: false
custom:
- seriesQuery: 'vertica_query_requests_attempted_total{namespace!="", pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "^(.*)_total$"
as: "${1}_rate_per_second"
metricsQuery: 'sum(increase(vertica_query_requests_attempted_total[5m])) by (namespace, pod)'
# curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=vertica_cpu_aggregate_usage_percentage' | jq
# curl --request GET -g 'http://localhost:9090/api/v1/query?query=avg_over_time(vertica_cpu_aggregate_usage_percentage[60m])' | jq
# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/verticadb-sample-sc1-0/vertica_cpu_aggregate_usage_percentage
# curl --request GET -g --data-urlencode 'query=count(vertica_cpu_aggregate_usage_percentage[1m]) by (namespace, pod)' http://localhost:9090/api/v1/query? | jq -r '.data.result[]'

- seriesQuery: 'vertica_cpu_aggregate_usage_percentage{namespace!="", pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
# name:
# matches: "^vertica_cpu_aggregate_usage_percentage$"
# as: "vertica_cpu_aggregate_usage_percentage" # If rename needed
metricsQuery: 'avg_over_time(vertica_cpu_aggregate_usage_percentage[60m])' # 10174m means 10.174% per hour in average Ref: https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/walkthrough.md#quantity-values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each of these queries we need a detailed description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, but should we do the description in the developer doc, instead of here in adapter.yaml?
Do we expect the user to use the metrics we provided, or we provide example here and expect they can customize on their own?

# curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=vertica_process_memory_usage_percent' | jq
# curl --request GET -g 'http://localhost:9090/api/v1/query?query=avg_over_time(vertica_process_memory_usage_percent[60m])' | jq
# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/verticadb-sample-sc1-0/vertica_process_memory_usage_percent
- seriesQuery: 'vertica_process_memory_usage_percent{namespace!="", pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
# name:
# matches: "^vertica_query_requests_attempted_total$"
# as: "vertica_query_requests_attempted_total"
metricsQuery: 'avg_over_time(vertica_process_memory_usage_percent[60m])' # 2058m means 2.058% per hour
# curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=vertica_sessions_running_counter' | jq
# curl --request GET http://localhost:9090/api/v1/query?query=vertica_sessions_running_counter | jq
# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/verticadb-sample-sc1-0/vertica_sessions_running_counter
- seriesQuery: 'vertica_sessions_running_counter{namespace!="", pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
# name:
# matches: "^vertica_sessions_running_counter$"
# as: "vertica_sessions_running_counter"
metricsQuery: 'sum(increase(vertica_sessions_running_counter[60m])) by (namespace, pod)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you played around with these queries on prometheus to check if they make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I ran the Vertica API to get the value and compared it with the Prometheus API result. For example, percentage value, like CPU/memory usage in an average of time, it gives for example 10174m means 10.174% per hour in average, using the avg_over_time function. I left the example in the code.

Ref: https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/walkthrough.md#quantity-values

HaoYang0000 marked this conversation as resolved.
Show resolved Hide resolved
# curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=vertica_queued_requests_total' | jq
# curl --request GET http://localhost:9090/api/v1/query?query=vertica_queued_requests_total | jq
# kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/verticadb-sample-sc1-0/vertica_queued_requests_total
- seriesQuery: 'vertica_queued_requests_total{namespace!="", pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
# name:
# matches: "^vertica_queued_requests_total$"
# as: "vertica_queued_requests_total"
metricsQuery: 'sum(increase(vertica_queued_requests_total[60m])) by (namespace, pod)'
HaoYang0000 marked this conversation as resolved.
Show resolved Hide resolved
25 changes: 25 additions & 0 deletions tests/e2e-leg-11/prometheus-sanity/16-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-adapter
status:
replicas: 1
readyReplicas: 1
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-adapter
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: cd ../../.. && make deploy-prometheus-adapter PROMETHEUS_NAMESPACE=$NAMESPACE PROMETHEUS_ADAPTER_NAMESPACE=$NAMESPACE
23 changes: 23 additions & 0 deletions tests/e2e-leg-11/prometheus-sanity/30-assert.yaml
HaoYang0000 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Pod
metadata:
name: script-verity-prometheus-adapter
status:
containerStatuses:
- name: test
state:
terminated:
exitCode: 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: ConfigMap
metadata:
name: script-verity-prometheus-adapter
data:
entrypoint.sh: |-
#!/bin/bash
set -o errexit
set -o xtrace

for i in {1..60}; do
kubectl exec v-prometheus-pri1-0 -it -c server -- vsql -w 'topsecret' -c "select count(*) from nodes;"
done

NAMESPACE=$(kubectl get pod v-prometheus-pri1-0 -o=jsonpath='{.metadata.namespace}')
CMD="kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/${NAMESPACE}/pods/v-prometheus-pri1-0/vertica_query_requests_attempted_rate_per_second"
RESULT=$(eval $CMD)
VALUE=$(echo $RESULT | jq -r '.items[0].value')

REQUESTS_COUNT=${VALUE::-4}
if [[ "$REQUESTS_COUNT" == "" || "$REQUESTS_COUNT" -lt 60 ]]; then
echo "Error: got $REQUESTS_COUNT. Expected requests no less than 60."
exit 1
fi
---
apiVersion: v1
kind: Pod
metadata:
name: script-verity-prometheus-adapter
labels:
stern: include
spec:
restartPolicy: Never
containers:
- name: test
image: bitnami/kubectl:1.20.4
command: ["/bin/entrypoint.sh"]
volumeMounts:
- name: entrypoint-volume
mountPath: /bin/entrypoint.sh
readOnly: true
subPath: entrypoint.sh
volumes:
- name: entrypoint-volume
configMap:
defaultMode: 0777
name: script-verity-prometheus-adapter
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: cd ../../.. && make undeploy-prometheus-adapter PROMETHEUS_ADAPTER_NAMESPACE=$NAMESPACE
22 changes: 22 additions & 0 deletions tests/e2e-leg-11/prometheus-sanity/85-errors.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-adapter
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-adapter
Loading