Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP/DP Split: write configuration to agent #2999

Draft
wants to merge 7 commits into
base: change/control-data-plane-split
Choose a base branch
from

Conversation

sjberman
Copy link
Collaborator

@sjberman sjberman commented Jan 9, 2025

This commit adds functionality to send nginx configuration to the agent. It also adds support for the single nginx Deployment to be scaled, and send configuration to all replicas. This requires tracking all Subscriptions for a particular deployment, and receiving all responses from those replicas to determine the status to write to the Gateway.

Right now we do not watch for Pod creation events for the nginx Deployment, so when a Pod first starts up, it will not receive any configuration. Only when the config changes will the nginx Pods get an update.

Testing: Created Gateway API resources and saw the configuration applied to the nginx deployment. Scaled nginx, made more changes, and saw those applied as well.

Closes #2842

This commit adds functionality to send nginx configuration to the agent. It also adds support for the single nginx Deployment to be scaled, and send configuration to all replicas. This requires tracking all Subscriptions for a particular deployment, and receiving all responses from those replicas to determine the status to write to the Gateway.

Right now we do not watch for Pod creation events for the nginx Deployment, so when a Pod first starts up, it will not receive any configuration. Only when the config changes will the nginx Pods get an update.
@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file chore Pull requests for routine tasks helm-chart Relates to helm chart labels Jan 9, 2025
internal/mode/static/handler.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/agent/agent.go Show resolved Hide resolved
return fmt.Errorf("failed to register nginx deployment %q", nsName.Name)
}

// TODO(sberman): wait to send config until Deployment pods have all connected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait for all pods to connect? Can't we just send the config to all the pods that are running?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true

Copy link
Collaborator Author

@sjberman sjberman Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no pods are running/connected, but we have a deployment, then we should wait until they're connected (or timeout).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels backwards to me. An agent Subscribe/Connect (or combo I can't remember) should trigger a config apply. We know exactly when an agent is ready to receive config, we shouldn't have to wait for it. I think building in waits is going to lead to issues down the line.

What if we building and storing the config was done separately from updating the config? When a new agent subscribes, we grab the latest config stored for that deployment and send it to the agent. When a new config is built, we send a message on a channel saying "Config updated" and the listeners will grab the config and push to the agents. Status can be handled completely async.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of had this built in originally, but if we get a connection and then grab the latest config, how does status get applied based on that outcome? The handler that processes a batch, builds, and sets status isn't involved in this application of the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm just hung up on this approach because it's the route I took when I was working with agent v2: 4c090c6#diff-d4bd8c3fac26361875c036e64559f3d3a96b95cd03729ccdd1cd6fe675277b81.

However, there are undoubtably other approaches and my approach might not be the best one.

I think as long as we can avoid the problem you stated where we wait forever for a pod to come up then it doesn't matter which approach we take.

The status updater requires the Graph, which only the handler has. Unless we're now storing the graph on the Deployment object as well?

Most of the status can be generated before we apply the configuration. So we could generate the status with the graph in the handler and then store it alongside the config. Once we apply the config, we can update the status to reflect the outcome of the config apply. Just an idea.

In my original implementation, the Subscriber got the current config from the DeploymentStore (no need to contact the broadcaster) and then once it applied it, it started its listen loop. I think this works, but there's still the pain point of the Subscriber having to update statuses. It would need the Graph and it would have to ensure it doesn't overwrite any previous error statuses that may have occurred.

I think we can solve the status problem with my suggestion above, but I haven't looked into it too deeply.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the status can be generated before we apply the configuration. So we could generate the status with the graph in the handler and then store it alongside the config. Once we apply the config, we can update the status to reflect the outcome of the config apply. Just an idea.

Yeah that's an interesting thought...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the Subscriber writes status, it's going to fall into the same trap as the leader election problem, where multiple entities are wanting to write status.

I like having the subscriber grab the current config when it first connects and applying it. But I can't nail down how to write that status...one idea is the handler has a goroutine where it listens for Subscribers to send it a status to write. It would need to keep track of each Subscription's status and get the Gateway's current status to ensure that it doesn't overwrite any errors that were written by other Subscriptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the Subscriber writes status, it's going to fall into the same trap as the leader election problem, where multiple entities are wanting to write status.

Agreed, you need one thread writing status that reads off a channel.

It would need to keep track of each Subscription's status and get the Gateway's current status to ensure that it doesn't overwrite any errors that were written by other Subscriptions.

This is the tricky bit. Is the Gateway the only resource we need to worry about overwriting status on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tricky bit. Is the Gateway the only resource we need to worry about overwriting status on?

Yes I believe so. Just the Programmed condition. For both Gateway and Listener conditions.

internal/mode/static/nginx/agent/agent.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/agent/broadcast/broadcast.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/agent/broadcast/broadcast.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/agent/broadcast/broadcast.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/agent/broadcast/broadcast.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/agent/command.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/agent/broadcast/broadcast.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Pull requests for routine tasks dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation helm-chart Relates to helm chart
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

2 participants