Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable disabling TX checksum offload for Antrea host gateway #6843

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Dec 4, 2024

Fixes: #6806

This commit introduces the ability to disable TX checksum offload for the host gateway interface (default: antrea-gw0) by setting the disableTXChecksumOffload option to true.

Note: If this option is later set to false, Antrea will not restore the original TX checksum state, as it does not retain the original configuration. Users are responsible for manually reconfiguring the setting if needed.

@hongliangl hongliangl requested review from antoninbas and tnqn December 4, 2024 10:13
@hongliangl hongliangl added this to the Antrea v2.3 release milestone Dec 4, 2024
@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from e92e5ab to 1fab9f9 Compare December 5, 2024 10:46
Comment on lines 185 to 187
# If this option is later set to false, for the host gateway interface, Antrea will not restore its
# original TX checksum state, as Antrea does not retain the original configuration. Users are responsible
# for manually reconfiguring the setting if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the same for container network interfaces? I feel it could say the option affects newly created interfaces only when it's set to true, and does nothing when it's set to false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the same for container network interfaces. I'll optimize the sentences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, it's a little different from container network interface. Currently, we will disable TX checksum on existing antrea-gw0 as well as newly created antrea-gw0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.

pkg/agent/agent_linux.go Outdated Show resolved Hide resolved
@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Dec 9, 2024
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we should probably update docs/antrea-l7-network-policy.md as well

Do you think there is any way to check for the checksum issue described in #6806 in our e2e L7NetworkPolicy tests?

Comment on lines +186 to +187
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.

For Pods, yes. For the gateway interface, I don't know if this is the best advice as it is a bit disruptive and unpractical for users. We could consider an antctl command (maybe in the future) to change the offload configuration on all Nodes easily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I will add the sub command in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.) as we do not provide useful guidance on how to do it practically.

@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch 2 times, most recently from 3268120 to 09655e1 Compare December 12, 2024 08:27
@hongliangl
Copy link
Contributor Author

hongliangl commented Dec 12, 2024

LGTM, we should probably update docs/antrea-l7-network-policy.md as well

Updated

Do you think there is any way to check for the checksum issue described in #6806 in our e2e L7NetworkPolicy tests?

I got an idea like this:

Assuming that there are two Nodes, controlplane and worker.

  1. Create a hostNetwork agnhost Pod on Node worker.
  2. Create a non-hostNetwork busybox Pod installed curl on Node controlplane.
  3. Create a L7 NetworkPolicy applied to the above busybox Pod. This policy is used to enforce the connections between the busybox Pod and the agnhost Pod to be processed by Suricata on Node controlplane.
  4. Run command curl http://192.168.77.101/echo?msg=$(head -c 2000 </dev/zero | tr '\0' 'A') on busybox Pod. This command is used to get a reply oversized packets larger than MTU.
  5. If antrea-gw0 TX checksumming is on:
    • The packet larger than MTU will be sent to Suricata via antrea-l7-tap0.
    • The packet larger than MTU will be sent back to OVS via antrea-l7-tap1. Since the size is larger than MTU and TX offloading of antrea-l7-tap1 is off, log like 2024-12-12 08:13:59 Warning: af-packet: antrea-l7-tap1: sending packet failed on socket 22: Message too long will be generated in /var/log/antrea/networkpolicy/l7engine/suricata.log in Node controlplane.

We could check the log to verify the issue described in #6806. If we disable antrea-gw0 TX checksumming, we should receive such log.

One more thing, we cannot verify the issue with the result of command curl http://192.168.77.101/echo?msg=$(head -c 2000 </dev/zero | tr '\0' 'A') because it will not fail due to the retransmission of TCP (the agnhost server will send small packets when retransmitting).

When I tried to verify the idea, I got some troubles in step 4:

  • The data of HTTP GET is larger (2000 bytes) than MTU, and the data will be encapsulated into two packets.
  • The packets can be received by Suricata successfully via antrea-l7-tap`.
  • The connection will get a refuse, and we will get the log like the following in Suricata log eve-2024-12-12.json:
    {"timestamp":"2024-12-12T08:59:10.650212+0000","flow_id":1943321052111352,"in_iface":"antrea-l7-tap0","event_type":"alert","vlan":[2],"src_ip":"10.10.0.8","src_port":47414,"dest_ip":"192.168.77.101","dest_port":80,"proto":"TCP","pkt_src":"wire/pcap","tenant_id":2,"alert":{"action":"blocked","gid":1,"signature_id":1,"rev":0,"signature":"Reject by AntreaNetworkPolicy:default/egress-allow-http","category":"","severity":3,"tenant_id":2},"app_proto":"http","direction":"to_server","flow":{"pkts_toserver":3,"pkts_toclient":1,"bytes_toserver":1616,"bytes_toclient":78,"start":"2024-12-12T08:59:10.649072+0000","src_ip":"10.10.0.8","dest_ip":"192.168.77.101","src_port":47414,"dest_port":80},"packet":"ugwcu18C8njWJ8a6gQAAAggARQAFqq/HQABABm1nCgoACMCoTWW5NgBQewpZO6SagdOAEAH7wSoAAAEBCArkxft3d1ezqkdFVCAvZWNobz9tc2c9QUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQQ==","packet_info":{"linktype":1}}
    

The rejected packet logged above, which is encoded in base64, is part of the HTTP request.

It seems that there is something wrong with Suricata TCP reassembly. I have inspected the related options, which should be qualified to handle the HTTP request with 2000 bytes data. These are the options in /etc/suricata/suricata.yaml:

stream:
  memcap: 64mb
  #memcap-policy: ignore
  checksum-validation: yes      # reject incorrect csums
  #midstream: false
  #midstream-policy: ignore
  inline: auto                  # auto will use inline mode in IPS mode, yes or no set it statically
  reassembly:
    memcap: 256mb
    #memcap-policy: ignore
    depth: 1mb                  # reassemble 1mb into a stream
    toserver-chunk-size: 2560
    toclient-chunk-size: 2560
    randomize-chunk-size: yes
    #randomize-chunk-range: 10
    #raw: yes
    #segment-prealloc: 2048
    #check-overlap-different-data: true

I'm trying to find some options in Suricata to resolve the issue above. I would greatly appreciate any ideas or inspiration you could offer regarding the issue. @antoninbas @tnqn

@hongliangl
Copy link
Contributor Author

Found the root cause of the above issue and added a PR #6857 to fix it.

@antoninbas
Copy link
Contributor

@hongliangl for the test, can we generate a query with a large reply (larger than MTU) but a small request payload (smaller than MTU), so that we don't hit the new issue (#6857)? Maybe the /shell agnhost endpoint will work instead of /echo?
It's also cleaner since you are using a shell command to generate the message anyway:

curl --data-urlencode "cmd=head -c 2000 </dev/zero | tr '\0' 'A'" http://192.168.77.101//shell?cmd

Probably better if we don't assume a specific MTU value (1500 -> 2000) and we can determine the MTU dynamically.

@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from 09655e1 to 11fab3f Compare January 27, 2025 08:27
@hongliangl hongliangl requested a review from antoninbas January 27, 2025 08:29
This commit introduces the ability to disable TX checksum offload
for the host gateway interface (default: `antrea-gw0`) by setting the
`disableTXChecksumOffload` option to `true`.

If this option is later set to false, Antrea does nothing to the affected
container network interfaces and the host gateway interface. To restore
the default TX checksum state of the affected interfaces, it is recommended
to delete them and recreate.

To delete the affected interfaces, for container network interfaces, delete
the corresponding Pods; for the host gateway interface, uninstall Antrea,
reboot all K8s Nodes and install Antrea.

Signed-off-by: Hongliang Liu <[email protected]>
@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from 11fab3f to 90b5065 Compare January 27, 2025 08:49
@hongliangl
Copy link
Contributor Author

@hongliangl for the test, can we generate a query with a large reply (larger than MTU) but a small request payload (smaller than MTU), so that we don't hit the new issue (#6857)? Maybe the /shell agnhost endpoint will work instead of /echo? It's also cleaner since you are using a shell command to generate the message anyway:

curl --data-urlencode "cmd=head -c 2000 </dev/zero | tr '\0' 'A'" http://192.168.77.101//shell?cmd

Probably better if we don't assume a specific MTU value (1500 -> 2000) and we can determine the MTU dynamically.

@antoninbas Sorry for not replying this late. The command you provided is a feasible way to mock the expected HTTP request! I have added an e2e new test using it.

@@ -59,6 +60,9 @@ func TestL7NetworkPolicy(t *testing.T) {
}
}()

t.Run("HTTP with large body", func(t *testing.T) {
testL7NetworkPolicyHTTPWithLargeBody(t, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/testL7NetworkPolicyHTTPWithLargeBody/testL7NetworkPolicyHTTPLargeResponse

Comment on lines +186 to +187
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.) as we do not provide useful guidance on how to do it practically.

@@ -262,3 +263,13 @@ func (i *Initializer) prepareL7EngineInterfaces() error {
}
return nil
}

func (i *Initializer) setTXChecksumOffload() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setTXChecksumOffloadOnGateway

// Run the command that makes the test server send an HTTP response with a body larger than the MTU on the
// test client Pod.
testBodySize := mtu * 2
cmd = []string{"sh", "-c", fmt.Sprintf(`curl --data-urlencode "cmd=head -c %d </dev/zero | tr '\0' 'A'" http://%s/shell?cmd`, testBodySize, baseURL)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to wrap this command using a shell invocation. Does the following work?

cmd := []string{"curl", "--data-urlencode", fmt.Sprintf("cmd=head -c %d </dev/zero | tr '\0' 'A'", testBodySize), "http://%s/shell?cmd"}

Note that while the cmd parameter is a shell command, it will be executed on the server, not on the client Pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very low throughput when using L7NetworkPolicies with an external host
3 participants