-
Integrate a cluster, located behind a firewall or NAT, with GitLab. To learn more, read issue #212810, Invert the model GitLab.com uses for Kubernetes integration by leveraging long lived reverse tunnels.
-
Access API endpoints in a cluster in real time. For an example use case, read issue #218220, Allow Prometheus in K8s cluster to be installed manually.
-
Enable real-time features by pushing information about events happening in a cluster. For example, you could build a cluster view dashboard to visualize changes in progress in a cluster. For more information about these efforts, read about the Real-Time Working Group.
-
Enable a cache of Kubernetes objects through informers, kept up-to-date with very low latency. This cache helps you:
- Reduce or eliminate information propagation latency by avoiding Kubernetes API calls and polling, and only fetching data from an up-to-date cache.
- Lower the load placed on the Kubernetes API by removing polling.
- Eliminate any rate-limiting errors by removing polling.
- Simplify backend code by replacing polling code with cache access. While it's another API call, no polling is needed. This example describes fetching cached data synchronously from the front end instead of fetching data from the Kubernetes API.
The GitLab Agent and the GitLab Agent Server use bidirectional streaming to allow the connection acceptor (the gRPC server, GitLab Agent Server) to act as a client. The connection acceptor sends requests as gRPC replies. The client-server relationship is inverted because the connection must be initiated from inside the Kubernetes cluster to bypass any firewall or NAT the cluster may be located behind. To learn more about this inversion, read issue #212810.
This diagram describes how GitLab (GitLab RoR
), the GitLab Agent (agentk
), and the GitLab Agent Server (kas
) work together.
graph TB
agentk -- gRPC bidirectional streaming --> kas
subgraph "GitLab"
kas[kas]
GitLabRoR[GitLab RoR] -- gRPC --> kas
kas -- gRPC --> Gitaly[Gitaly]
kas -- REST API --> GitLabRoR
end
subgraph "Kubernetes cluster"
agentk[agentk]
end
GitLab RoR
is the main GitLab application. It uses gRPC to talk tokas
.agentk
is the GitLab Agent. It keeps a connection established to akas
instance, waiting for requests to process. It may also actively send information about things happening in the cluster.kas
is the GitLab Agent Server, and is responsible for:- Accepting requests from
agentk
. - Authentication of requests from
agentk
by queryingGitLab RoR
. - Fetching agent's configuration from a corresponding Git repository by querying Gitaly.
- Matching incoming requests from
GitLab RoR
with existing connections from the rightagentk
, forwarding requests to it and forwarding responses back. - (Optional) Sending notifications through ActionCable for events received from
agentk
. - Polling manifest repositories for GitOps support by communicating with Gitaly.
- Accepting requests from
To learn more about how the repository is structured, see GitLab Agent repository overview.
GitLab prefers to add logic into kas
rather than agentk
. agentk
should be kept
streamlined and small to minimize the need for upgrades. On GitLab.com, kas
is
managed by GitLab, so upgrades and features can be added without requiring you
to upgrade agentk
in your clusters.
agentk
can't be viewed as a dumb reverse proxy because features are planned to be built
on top of the cache with informers.
-
Q: Why do we need long-running connections? Cannot we just ask users to punch a hole in their firewall?
A: Even if it was always possible, having an agent running in the cluster enables us to build more features, not just connecting such clusters to GitLab.
-
Q: Why do we need long-running connections? Can we use polling instead?
A: Polling will not allow for real-time access to the in-cluster APIs. For example, our metrics integration queries Prometheus API to get the data for dashboards. I.e. the request comes from GitLab, not from the Kubernetes side.
-
Q: Can we push data about things happening in the cluster using REST API instead of using streaming?
A: Yes, we could use REST API and push the data. But since we already need long-running connections (see above), why not utilize them?
agentk
uses gRPC which multiplexes multiple logical channels onto a usually smaller number of TCP connections. There will be a smaller number of TCP connections if all communications are consolidated and happen over gRPC. See the link for technical details. -
Q: Can we put the cache into
kas
rather than intoagentk
to make it "dumber" / simpler per the principle above?A: Technically yes. However, that would mean
kas
would get an update for each event for object kindskas
runs informers for. An event contains the whole changed object. Number of events in active clusters may be significant. So, multiplied by the number of clusters, that means we'd have a lot of traffic betweenkas
andagentk
. There is no practical benefit for building it this way, only the downside of having a lot of useless traffic.Instead of the above we could have an up to date precomputed view on top of the cache in
kas
.agentk
could make the calculations locally and push an update immediately tokas
(which could push an event via ActionCable). For example,agentk
could maintain a cache withNode
objects and push the current number of nodes tokas
each time there is a change. The UI then can fetch the number of nodes fromkas
via an API (via the main application or bypassing it). -
Q: Why poll Git repositories from
kas
and not fromagentk
?A: There are several reasons:
-
Allows the GitLab instance operator to configure polling frequency and any other relevant parameters. Makes it harder for the user to misuse/abuse the system.
-
Fewer knobs for the user to tweak means more straightforward configuration experience.
-
Makes it easier to implement pub/sub / push-based notifications about changes to the repo to reduce polling frequency.
-
It follows the "smart
kas
, dumbagentk
" principle described above.
-