-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keda operator fails with "unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w(<nil>)" in v.2.15.1 using Azure event Hub trigger #6084
Comments
Is this due to AKS kubernetes version compatibility with KEDA version? From documentation here it seems the KEDA add on uses AKS kubernetes 1.30 with KEDA 2.14.. and KEDA 2.15 is to be used in AKS kubernets 1.31 So, when we deploy KEDA to AKS, without using AKS add on for KEDA, should we consider the same versions, as used by add on depending on AKS kubernetes version? For now as a solution for my problem I am going to stay with KEDA 2.14 until I upgrade my AKS to use kubernetes 1.31, before retrying KEDA 2.15 |
Hello keda/tests/scalers/azure/azure_event_hub_blob_metadata_wi/azure_event_hub_blob_metadata_wi_test.go Lines 135 to 143 in fc002f0
|
@JorTurFer thank you for response.. I have moved on to use the managed add on for KEDA for AKS. So, I am currently on AKS with kubernetes 1.30.3 with KEDA 2.14. However I will try to create a test environment and test the fixed version of KEDA and get back to you |
@JorTurFer I tried deploying with
The keda operator crashloopback off with below in logs of keda-operator pod
|
Oh, sorry, we introduced a new CRD (that'll be ship with v2.16), this is the CRD that you need to deploy into the cluster too -> https://github.com/kedacore/keda/blob/main/config/crd/bases/eventing.keda.sh_clustercloudeventsources.yaml |
@JorTurFer with the CRD deployed now keda operator seems to be needning some additional permissions "system:serviceaccount:keda:keda-operator" is the service account I am using for enabling workload identity. With this CRD does the workload identity require any additional permisions in Azure resources or for AKS cluster?
|
@JorTurFer With the changes you mentioned above, I managed to run keda operator with your tag The issue seems to be with For example here are my two scaled jobs current checkpoint blobs largepreview-scaledjob As per scaled job log it is looking for checkpoint blob 7
largevideo-scaledjob As per scaled job log it is looking for checkpoint blob 0
Both of the scaled jobs showing same symptoms only with 2.15.1 and failing by looking for none existing checkpoint blob name. The keda operator (with tag
When I deploy KEDA with The issue above is only happening with 2.15.1 I suspect KEDA 2.15.1 is not refreshing the storage check point blob list correctly before checking for checkpoint blob metadata. While 2.14.x KEDA is not having the problem. |
We introduced a bug when we upgrade the SDK but I think that this PR will solve the issue -> #6096 Are you willing to test the fix? This is the tag with the fix -> |
@JorTurFer The PR #6096 seems to have fixed the issue. I deployed Will you be releasing a fixed version for |
I missed this message sorry :/ |
Thanks @JorTurFer will close this issue once 2.16 available and tested |
Hi team, how long will it take for this fix to be released? I've been using Azure ACA and have been experiencing this issue for a while. |
We are not related with ACA team, so our release doesn't solve ACA issue as they have their own lifecycle |
@chamindac , The version 2.16 was release 12 weeks ago, can I close this issue? |
Hi @JorTurFer I get this same error deploying keda via the helm chart, using version 2.16.0, when using watchNamespaces: Without watchNamespaces keda is deployed successfully. I see that clustercloudeventsources is allowed via the keda-operator cluster role. When watchNamespace is not in use, the chart binds the keda-operator service account to the above cluster role at the cluster level using clusterrolebindings:
In the above scenario keda works. However, when using watchNamespace the keda clusterrolebinding is switched to a rolebinding on the watched namespaces (+ the keda namspace):
This results in the keda-operator service account being bound to the keda-operator cluster role at the namespace level, using the above rolebindings, which limits access to clustercloudeventsources at the namespace scope (while the crd is cluster scoped). I have tried adding the relevant block of permission to the minimal cluster role:
And the keda-operator pod works and stops failing to list the resource. I suggest, if my understanding is correct and this makes sense to you design wise, to add the clustercloudeventsources resources that exist in the keda-operator cluster role template, to the minimal cluster role template as well. |
I'm trying to reproduce the issue with helm chart but I can't, could you share an example of the values file? |
here's mine @JorTurFer # values.yaml
clusterName: example
# Missing rbac perms upstream
# ref: https://github.com/kedacore/keda/issues/6084
#
# watchNamespace: "\
# example,\
# default"
permissions:
metricServer:
restrict:
secret: true
operator:
restrict:
# FIXME: true when watchNamespace above is addressed.
secret: false |
@JorTurFer I will test my orginal issue soon and update here (sorry for the delay.. I was on a bit long vacation :) ) |
now I've reproduced the issue, thanks for the yaml! |
The fix is merged in helm chart, next helm releases will include the RBAC fix @cilindrox. Thanks! |
@JorTurFer @cilindrox does this address #716? |
@fouadsemaan it should - PR adds the same rbac perms mentioned in #716 |
I have keda deployed with version v2.15.1 on AKS using work load identity. AKS k8s version is 1.29.7.
My scaled job trigges based on azure event hub. Keda operator shows issue "unable to get unprocessedEventCount for metrics: unable to get checkpoint from storage: %!w()"
The setup was working fine with KEDA v2.14.2 on AKS using work load identity. AKS k8s version is 1.29.7.
Scled job shows below issues
The keda operator pod log shows below
If I deploy KEDA v2.14.2 or v2.14.3 on top of v2.15.1 without changing anything else in my setup everything starts to work fine. and status of my scaled job comes back to normal as below log shows.
Below are more information on my setup.
I deployed keda using below
KEDA triiger auth setup as
My scaled job triggers
I can provide more information and logs if required.
In summary this is what happens
The text was updated successfully, but these errors were encountered: