-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable disabling TX checksum offload for Antrea host gateway #6843
base: main
Are you sure you want to change the base?
Enable disabling TX checksum offload for Antrea host gateway #6843
Conversation
e92e5ab
to
1fab9f9
Compare
# If this option is later set to false, for the host gateway interface, Antrea will not restore its | ||
# original TX checksum state, as Antrea does not retain the original configuration. Users are responsible | ||
# for manually reconfiguring the setting if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the same for container network interfaces? I feel it could say the option affects newly created interfaces only when it's set to true, and does nothing when it's set to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the same for container network interfaces. I'll optimize the sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, it's a little different from container network interface. Currently, we will disable TX checksum on existing antrea-gw0 as well as newly created antrea-gw0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we should probably update docs/antrea-l7-network-policy.md
as well
Do you think there is any way to check for the checksum issue described in #6806 in our e2e L7NetworkPolicy tests?
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces, | ||
# it is recommended to delete them and recreate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.
For Pods, yes. For the gateway interface, I don't know if this is the best advice as it is a bit disruptive and unpractical for users. We could consider an antctl command (maybe in the future) to change the offload configuration on all Nodes easily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. I will add the sub command in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.
) as we do not provide useful guidance on how to do it practically.
3268120
to
09655e1
Compare
Updated
I got an idea like this: Assuming that there are two Nodes,
We could check the log to verify the issue described in #6806. If we disable antrea-gw0 TX checksumming, we should receive such log. One more thing, we cannot verify the issue with the result of command When I tried to verify the idea, I got some troubles in step 4:
The rejected packet logged above, which is encoded in base64, is part of the HTTP request. It seems that there is something wrong with Suricata TCP reassembly. I have inspected the related options, which should be qualified to handle the HTTP request with 2000 bytes data. These are the options in stream:
memcap: 64mb
#memcap-policy: ignore
checksum-validation: yes # reject incorrect csums
#midstream: false
#midstream-policy: ignore
inline: auto # auto will use inline mode in IPS mode, yes or no set it statically
reassembly:
memcap: 256mb
#memcap-policy: ignore
depth: 1mb # reassemble 1mb into a stream
toserver-chunk-size: 2560
toclient-chunk-size: 2560
randomize-chunk-size: yes
#randomize-chunk-range: 10
#raw: yes
#segment-prealloc: 2048
#check-overlap-different-data: true I'm trying to find some options in Suricata to resolve the issue above. I would greatly appreciate any ideas or inspiration you could offer regarding the issue. @antoninbas @tnqn |
Found the root cause of the above issue and added a PR #6857 to fix it. |
@hongliangl for the test, can we generate a query with a large reply (larger than MTU) but a small request payload (smaller than MTU), so that we don't hit the new issue (#6857)? Maybe the
Probably better if we don't assume a specific MTU value (1500 -> 2000) and we can determine the MTU dynamically. |
09655e1
to
11fab3f
Compare
This commit introduces the ability to disable TX checksum offload for the host gateway interface (default: `antrea-gw0`) by setting the `disableTXChecksumOffload` option to `true`. If this option is later set to false, Antrea does nothing to the affected container network interfaces and the host gateway interface. To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate. To delete the affected interfaces, for container network interfaces, delete the corresponding Pods; for the host gateway interface, uninstall Antrea, reboot all K8s Nodes and install Antrea. Signed-off-by: Hongliang Liu <[email protected]>
11fab3f
to
90b5065
Compare
@antoninbas Sorry for not replying this late. The command you provided is a feasible way to mock the expected HTTP request! I have added an e2e new test using it. |
@@ -59,6 +60,9 @@ func TestL7NetworkPolicy(t *testing.T) { | |||
} | |||
}() | |||
|
|||
t.Run("HTTP with large body", func(t *testing.T) { | |||
testL7NetworkPolicyHTTPWithLargeBody(t, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/testL7NetworkPolicyHTTPWithLargeBody/testL7NetworkPolicyHTTPLargeResponse
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces, | ||
# it is recommended to delete them and recreate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.
) as we do not provide useful guidance on how to do it practically.
@@ -262,3 +263,13 @@ func (i *Initializer) prepareL7EngineInterfaces() error { | |||
} | |||
return nil | |||
} | |||
|
|||
func (i *Initializer) setTXChecksumOffload() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setTXChecksumOffloadOnGateway
// Run the command that makes the test server send an HTTP response with a body larger than the MTU on the | ||
// test client Pod. | ||
testBodySize := mtu * 2 | ||
cmd = []string{"sh", "-c", fmt.Sprintf(`curl --data-urlencode "cmd=head -c %d </dev/zero | tr '\0' 'A'" http://%s/shell?cmd`, testBodySize, baseURL)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to wrap this command using a shell invocation. Does the following work?
cmd := []string{"curl", "--data-urlencode", fmt.Sprintf("cmd=head -c %d </dev/zero | tr '\0' 'A'", testBodySize), "http://%s/shell?cmd"}
Note that while the cmd
parameter is a shell command, it will be executed on the server, not on the client Pod.
Fixes: #6806
This commit introduces the ability to disable TX checksum offload for the host gateway interface (default:
antrea-gw0
) by setting thedisableTXChecksumOffload
option totrue
.Note: If this option is later set to false, Antrea will not restore the original TX checksum state, as it does not retain the original configuration. Users are responsible for manually reconfiguring the setting if needed.