Remove watching for existing underline resources at init stage #97

cuttingedge1109 · 2021-04-12T08:24:21Z

This PR includes mainly 2 things:
Add environment variables to enable/disable controllers respectively

If ENABLE_GROUPCONFIG_CONTROLLER is true or not defined, GroupConfig controller is enabled.
If ENABLE_USERCONFIG_CONTROLLER is true or not defined, UserConfig controller is enabled.
If ENABLE_NAMESPACECONFIG_CONTROLLER is true or not defined, NamespaceConfig controller is enabled.

Do not queue for pre-existing underline resources at the initial/booting step

In namespaceConfig controller, do not queue pre-existing namespaces
In groupConfig controller, do not queue pre-existing groups
In userConfig controller, do not queue pre-existing users and identities

Closes #96

Add InitIdentityCount

raffaelespazzoli · 2021-04-12T16:01:54Z

I'm good with adding the variable based activation of the controller.
I don't understand the sense of the second part of the PR. Can you explain what the intention is?

cuttingedge1109 · 2021-04-12T16:10:31Z

To reduce the kube API requests and remove overlapped queueing.
At the booting stage, the pre-existing resources are treated as new resources and the watcher queues them.
But the targeting CR watcher is enough at the booting stage (for example, namespaceconfig CRs are queued and reconciled. We don't need to get the namespaceconfig lists per pre-existing namespace and reconcile again.)

raffaelespazzoli · 2021-04-12T16:27:52Z

@cuttingedge1109 we do need pre-existing resources, this what distinguish a level-based automatic control from a edge-based automatic control. Operator are level-based. I talk about this a bit here[1], but you can find more about that in internet.
[1]: https://www.openshift.com/blog/kubernetes-operators-best-practices

rasheedamir · 2021-04-12T18:31:38Z

@raffaelespazzoli with this change we do see that API server isn't bombarded anymore with insane amount of traffic; i have attached two screenshots one for kube-api-server and other for namespace-config-operator

raffaelespazzoli · 2021-04-12T19:07:02Z

I looked at the code and I confirm that this is not a good approach for me. The reason is that if the namespace operator pod dies and some objects are changed (updated/created/deleted) while the pod is down, they will never be reconciled.
You say that this phase creates a bunch of requests to the master api, but everything (here I mean for example namespaces and namespaceconfigs) should be already cached, so you should not really have seen much of a difference between the two approaches.
Would you be able to tell me what calls exactly are overwhelming the api server?

rasheedamir · 2021-04-12T20:31:45Z

The reason is that if the namespace operator pod dies and some objects are changed (updated/created/deleted) while the pod is down, they will never be reconciled.

Ah! very good point.

So you should not really have seen much of a difference between the two approaches.

Hmmm; but you can see from the graphs that with this new version api-server wasn't overwhelmed

Would you be able to tell me what calls exactly are overwhelming the api server?

The moment we turn on nco it kills api server; would be hard to figure out but lets see if we can figure out or not

raffaelespazzoli · 2021-04-13T00:38:15Z

@rasheedamir I'd like to know what calls overwhelm the API server GET vs POST and for which type. There are metrics to see these things.
How many users do you have in that cluster?

cuttingedge1109 · 2021-04-13T18:00:46Z

So regarding the 2nd part of changes is not to disable watchers.
There are 2 watchers

For(&redhatcopv1alpha1.NamespaceConfig{} will queue the existing namespaceconfig CRs at the startup while
Watches(&source.Kind{Type: &corev1.Namespace will queue the existing namespaceconfig CRs multiple times.
i.e. in the second watcher, the naemspace list is fetched first, then the following logic is performed per namespace: Get the namespaceconfig CRs which is specifying the current namespace. Then queue those namespaceconfig CRs.

In general, one namespaceConfig CR is poiting several namespaces(for our case, about 30 namespaces).
So in the second watcher, the same namespaceConfig CR is queued for 30 times.(I checked that).
The second watcher is to queue the corresponding namespaceconfig CRs when the namespace is updated. But when the operator is started, we queue the all namespaceconfig CRs in the first watcher so no need to queue again in the second watcher at the startup.

So I tried to skip the queueing only at the startup time and after that the second watcher will work.

cuttingedge1109 · 2021-04-13T18:04:54Z

Regarding the object has been modified; please apply your changes to the latest version and try again ERROR

It is caused by the SetEnforcingReconcileStatus. Now the CR's status is updated in its reconcile loop. It means that the CR is updated before the current reconcile is finished. I think it is not good. If we have a bit long latency in reconcile, then it can end up to infinite loop. (I experienced such loop. If we have 3rd party webhook in reconcile, it is obviously occured.) So I think there should be any condition to skip the reconcile which is triggered by former reconcile.

rasheedamir · 2021-04-19T20:54:46Z

@raffaelespazzoli do you agree to @cuttingedge1109? we definitely see that this version is far more stable and consumes way less memory

raffaelespazzoli · 2021-04-19T23:46:52Z

@rasheedamir I have explained why the create events at the start of the controller cannot be discarded. I am good only with the piece that selectively activates the controller based on environment variables.

cuttingedge1109 · 2021-04-22T09:12:51Z

@raffaelespazzoli that is not discarded, just added smoothly.
Can you explain more details about your idea why you disagree?
And also please give me feedback on above messages. #97 (comment) #97 (comment)
Thanks.

raffaelespazzoli · 2021-04-22T11:57:15Z

@cuttingedge1109 please explain me what you mean by "added smothly": it seems to me that with your proposal we drop the create event during the initial start of the controller. This is not acceptable and I have explained why.

Also I asked you guys to provide proof of how the api server is overwhelmed with some metrics that could indicate where the problem is, would you be able to do that?

I'm interested in fixing that issue if at all possible. I didn't understand your explanation though. If you have a fix proposal, I'd be happy to look at it. The main requirement here is that when the internal controllers complete their reconcile cycle, the resulting state needs to be reflected to the external CR status.

raffaelespazzoli · 2022-06-27T07:35:40Z

may I close this PR?

cuttingedge1109 and others added 2 commits April 12, 2021 10:23

Remove watching for existing underline resources at init stage

5db688a

Update main.go

d9d0b7c

Add InitIdentityCount

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove watching for existing underline resources at init stage #97

Remove watching for existing underline resources at init stage #97

cuttingedge1109 commented Apr 12, 2021 •

edited

Loading

raffaelespazzoli commented Apr 12, 2021

cuttingedge1109 commented Apr 12, 2021

raffaelespazzoli commented Apr 12, 2021

rasheedamir commented Apr 12, 2021

raffaelespazzoli commented Apr 12, 2021

rasheedamir commented Apr 12, 2021

raffaelespazzoli commented Apr 13, 2021 •

edited

Loading

cuttingedge1109 commented Apr 13, 2021

cuttingedge1109 commented Apr 13, 2021

rasheedamir commented Apr 19, 2021

raffaelespazzoli commented Apr 19, 2021

cuttingedge1109 commented Apr 22, 2021

raffaelespazzoli commented Apr 22, 2021

raffaelespazzoli commented Jun 27, 2022

Remove watching for existing underline resources at init stage #97

Are you sure you want to change the base?

Remove watching for existing underline resources at init stage #97

Conversation

cuttingedge1109 commented Apr 12, 2021 • edited Loading

raffaelespazzoli commented Apr 12, 2021

cuttingedge1109 commented Apr 12, 2021

raffaelespazzoli commented Apr 12, 2021

rasheedamir commented Apr 12, 2021

raffaelespazzoli commented Apr 12, 2021

rasheedamir commented Apr 12, 2021

raffaelespazzoli commented Apr 13, 2021 • edited Loading

cuttingedge1109 commented Apr 13, 2021

cuttingedge1109 commented Apr 13, 2021

rasheedamir commented Apr 19, 2021

raffaelespazzoli commented Apr 19, 2021

cuttingedge1109 commented Apr 22, 2021

raffaelespazzoli commented Apr 22, 2021

raffaelespazzoli commented Jun 27, 2022

cuttingedge1109 commented Apr 12, 2021 •

edited

Loading

raffaelespazzoli commented Apr 13, 2021 •

edited

Loading