Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add antreaProxy.disableServiceHealthCheckServer config #6939

Conversation

antoninbas
Copy link
Contributor

Running both kube-proxy and Antrea Proxy with proxyAll can trigger some error logs for Services of type LoadBalancer with externalTrafficPolicy set to Local. For such Services, the proxy is in charge of running a health check server on each Node, in order to report the number of local Endpoints which implement the Service. Because both kube-proxy and Antrea Proxy (with proxyAll) will try to run the health check servers on the same ports, one of them will fail to bind to the desired address. In practice, we typically observe that Antrea Proxy tries to bind to the address first (and succeeds), while kube-proxy fails and logs an error message, but there is no guarantee that it will be the case.

To avoid kube-proxy error logs, users can now set antreaProxy.disableServiceHealthCheckServer to true in the Antrea Agent's config, which will instruct Antrea Proxy to stop running health check servers for LoadBalancer Services. This is not a perfect solution as ideally the component responsible for the proxy implementation (Antrea Proxy in this case) should also be responsible for providing health check information.

@@ -411,6 +411,11 @@ antreaProxy:
# can reply to clients directly, bypassing the ingress Node.
# A Service's load balancer mode can be overridden by annotating it with `service.antrea.io/load-balancer-mode`.
defaultLoadBalancerMode: {{ .defaultLoadBalancerMode | quote }}
# Disables the health check server run by Antrea Proxy, which provides health information about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Disables the health check server run by Antrea Proxy, which provides health information about
# Disable the health check server run by Antrea Proxy, which provides health information about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongliangl we don't seem to have adopted a unified style for this. As it is, it is consistent with the configuration parameter that comes just before (defaultLoadBalancerMode):

Determines how external traffic is processed when it's load balanced across Nodes by default.

The use of the third-person singular makes sense to me, because it is supposed to mean: disableServiceHealthCheckServer disables the health check server run by Antrea Proxy

@@ -411,6 +411,11 @@ antreaProxy:
# can reply to clients directly, bypassing the ingress Node.
# A Service's load balancer mode can be overridden by annotating it with `service.antrea.io/load-balancer-mode`.
defaultLoadBalancerMode: {{ .defaultLoadBalancerMode | quote }}
# Disables the health check server run by Antrea Proxy, which provides health information about
# Services of type LoadBalancer with externalTrafficPolicy set to Local, when proxyAll is
# enabled. This avoids race conditions between kube-proxy and Antrea proxy, with both trying to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# enabled. This avoids race conditions between kube-proxy and Antrea proxy, with both trying to
# enabled. This avoids race conditions between kube-proxy and Antrea Proxy, with both trying to

@@ -153,6 +153,12 @@ antreaProxy:
# -- Determines how external traffic is processed when it's load balanced
# across Nodes by default. It must be one of "nat" or "dsr".
defaultLoadBalancerMode: "nat"
# -- Disables the health check server run by Antrea Proxy, which provides health
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# -- Disables the health check server run by Antrea Proxy, which provides health
# -- Disable the health check server run by Antrea Proxy, which provides health

@@ -238,6 +238,11 @@ type AntreaProxyConfig struct {
// can reply to clients directly, bypassing the ingress Node.
// A Service's load balancer mode can be overridden by annotating it with `service.antrea.io/load-balancer-mode`.
DefaultLoadBalancerMode string `yaml:"defaultLoadBalancerMode,omitempty"`
// Disables the health check server run by Antrea Proxy, which provides health information about Services of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Disables the health check server run by Antrea Proxy, which provides health information about Services of
// Disable the health check server run by Antrea Proxy, which provides health information about Services of

@antoninbas antoninbas force-pushed the add-antreaProxy.disableServiceHealthCheckServer-config branch from 863a108 to a20d029 Compare January 21, 2025 18:29

These log messages will keep repeating periodically, as kube-proxy handles
Service updates. While the messages are harmless, they can create a lot of
necessary noise. You may want to set `antreaProxy.disableServiceHealthCheckServer: true`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
necessary noise. You may want to set `antreaProxy.disableServiceHealthCheckServer: true`
unnecessary noise. You may want to set `antreaProxy.disableServiceHealthCheckServer: true`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a typo :)

@@ -1385,7 +1387,11 @@ func newProxier(
for i, address := range nodePortAddresses {
nodePortAddressesString[i] = address.String()
}
serviceHealthServer = healthcheck.NewServiceHealthServer(hostname, nil, nodePortAddressesString)
if serviceHealthServerDisabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serviceHealthServer is the only consumer of nodePortAddressesString, should the check of serviceHealthServerDisabled be moved to L1385?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

Running both kube-proxy and Antrea Proxy with `proxyAll` can trigger
some error logs for Services of type LoadBalancer with
`externalTrafficPolicy` set to `Local`. For such Services, the proxy is
in charge of running a health check server on each Node, in order to
report the number of local Endpoints which implement the Service.
Because both kube-proxy and Antrea Proxy (with `proxyAll`) will try to
run the health check servers on the same ports, one of them will fail to
bind to the desired address. In practice, we typically observe that
Antrea Proxy tries to bind to the address first (and succeeds), while
kube-proxy fails and logs an error message, but there is no guarantee
that it will be the case.

To avoid kube-proxy error logs, users can now set
`antreaProxy.disableServiceHealthCheckServer` to true in the Antrea
Agent's config, which will instruct Antrea Proxy to stop running health
check servers for LoadBalancer Services. This is not a perfect solution
as ideally the component responsible for the proxy implementation
(Antrea Proxy in this case) should also be responsible for providing
health check information.

Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
@antoninbas antoninbas force-pushed the add-antreaProxy.disableServiceHealthCheckServer-config branch from a20d029 to 0702c7e Compare January 22, 2025 19:55
@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas antoninbas requested a review from tnqn January 22, 2025 19:55
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas antoninbas merged commit 3a46081 into antrea-io:main Jan 23, 2025
57 of 62 checks passed
@antoninbas antoninbas deleted the add-antreaProxy.disableServiceHealthCheckServer-config branch January 23, 2025 18:18
@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants