CP/DP Split: write configuration to agent #2999

sjberman · 2025-01-09T23:12:41Z

This commit adds functionality to send nginx configuration to the agent. It also adds support for the single nginx Deployment to be scaled, and send configuration to all replicas. This requires tracking all Subscriptions for a particular deployment, and receiving all responses from those replicas to determine the status to write to the Gateway.

Right now we do not watch for Pod creation events for the nginx Deployment, so when a Pod first starts up, it will not receive any configuration. Only when the config changes will the nginx Pods get an update.

Testing: Created Gateway API resources and saw the configuration applied to the nginx deployment. Scaled nginx, made more changes, and saw those applied as well.

Closes #2842

This commit adds functionality to send nginx configuration to the agent. It also adds support for the single nginx Deployment to be scaled, and send configuration to all replicas. This requires tracking all Subscriptions for a particular deployment, and receiving all responses from those replicas to determine the status to write to the Gateway. Right now we do not watch for Pod creation events for the nginx Deployment, so when a Pod first starts up, it will not receive any configuration. Only when the config changes will the nginx Pods get an update.

internal/mode/static/handler.go

internal/mode/static/nginx/agent/agent.go

kate-osborn · 2025-01-10T18:00:23Z

internal/mode/static/nginx/agent/agent.go

+		return fmt.Errorf("failed to register nginx deployment %q", nsName.Name)
+	}
+
+	// TODO(sberman): wait to send config until Deployment pods have all connected.


Why do we need to wait for all pods to connect? Can't we just send the config to all the pods that are running?

Yeah that's true

If no pods are running/connected, but we have a deployment, then we should wait until they're connected (or timeout).

This feels backwards to me. An agent Subscribe/Connect (or combo I can't remember) should trigger a config apply. We know exactly when an agent is ready to receive config, we shouldn't have to wait for it. I think building in waits is going to lead to issues down the line.

What if we building and storing the config was done separately from updating the config? When a new agent subscribes, we grab the latest config stored for that deployment and send it to the agent. When a new config is built, we send a message on a channel saying "Config updated" and the listeners will grab the config and push to the agents. Status can be handled completely async.

I sort of had this built in originally, but if we get a connection and then grab the latest config, how does status get applied based on that outcome? The handler that processes a batch, builds, and sets status isn't involved in this application of the config.

I think I'm just hung up on this approach because it's the route I took when I was working with agent v2: 4c090c6#diff-d4bd8c3fac26361875c036e64559f3d3a96b95cd03729ccdd1cd6fe675277b81.

However, there are undoubtably other approaches and my approach might not be the best one.

I think as long as we can avoid the problem you stated where we wait forever for a pod to come up then it doesn't matter which approach we take.

The status updater requires the Graph, which only the handler has. Unless we're now storing the graph on the Deployment object as well?

Most of the status can be generated before we apply the configuration. So we could generate the status with the graph in the handler and then store it alongside the config. Once we apply the config, we can update the status to reflect the outcome of the config apply. Just an idea.

In my original implementation, the Subscriber got the current config from the DeploymentStore (no need to contact the broadcaster) and then once it applied it, it started its listen loop. I think this works, but there's still the pain point of the Subscriber having to update statuses. It would need the Graph and it would have to ensure it doesn't overwrite any previous error statuses that may have occurred.

I think we can solve the status problem with my suggestion above, but I haven't looked into it too deeply.

Most of the status can be generated before we apply the configuration. So we could generate the status with the graph in the handler and then store it alongside the config. Once we apply the config, we can update the status to reflect the outcome of the config apply. Just an idea.

Yeah that's an interesting thought...

I think if the Subscriber writes status, it's going to fall into the same trap as the leader election problem, where multiple entities are wanting to write status.

I like having the subscriber grab the current config when it first connects and applying it. But I can't nail down how to write that status...one idea is the handler has a goroutine where it listens for Subscribers to send it a status to write. It would need to keep track of each Subscription's status and get the Gateway's current status to ensure that it doesn't overwrite any errors that were written by other Subscriptions.

I think if the Subscriber writes status, it's going to fall into the same trap as the leader election problem, where multiple entities are wanting to write status.

Agreed, you need one thread writing status that reads off a channel.

It would need to keep track of each Subscription's status and get the Gateway's current status to ensure that it doesn't overwrite any errors that were written by other Subscriptions.

This is the tricky bit. Is the Gateway the only resource we need to worry about overwriting status on?

This is the tricky bit. Is the Gateway the only resource we need to worry about overwriting status on?

Yes I believe so. Just the Programmed condition. For both Gateway and Listener conditions.

internal/mode/static/nginx/agent/agent.go

internal/mode/static/nginx/agent/broadcast/broadcast.go

internal/mode/static/nginx/agent/command.go

internal/mode/static/nginx/agent/broadcast/broadcast.go

github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file chore Pull requests for routine tasks helm-chart Relates to helm chart labels Jan 9, 2025

sjberman added 2 commits January 9, 2025 16:24

fix failing unit test

bb8a252

Update permissions

a54bbbc

kate-osborn reviewed Jan 10, 2025

View reviewed changes

sjberman added 4 commits January 10, 2025 14:08

N+ support; fix channel blocking

fac256c

Fix channels and blocking; apply config on connection

88678b4

Fix lock

587041e

Only update upstreams if config was applied

41103c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CP/DP Split: write configuration to agent #2999

CP/DP Split: write configuration to agent #2999

sjberman commented Jan 9, 2025

kate-osborn Jan 10, 2025

sjberman Jan 10, 2025

sjberman Jan 10, 2025 •

edited

Loading

kate-osborn Jan 10, 2025

sjberman Jan 10, 2025

kate-osborn Jan 13, 2025

sjberman Jan 13, 2025

sjberman Jan 13, 2025

kate-osborn Jan 13, 2025

sjberman Jan 13, 2025

CP/DP Split: write configuration to agent #2999

Are you sure you want to change the base?

CP/DP Split: write configuration to agent #2999

Conversation

sjberman commented Jan 9, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjberman Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjberman Jan 10, 2025 •

edited

Loading