Antrea L7NetworkPolicies do not handle Service traffic correctly #6854

antoninbas · 2024-12-11T20:02:17Z

Describe the bug
Antrea L3/L4 policy rules handle Service traffic correctly: they are applied to traffic "post-DNAT", when the destination IP address has been rewritten to the endpoint IP.
I have observed that Service traffic is not handled correctly for policies with L7 rules: all the traffic is dropped by Suricata, independently of the rule contents.

To Reproduce
Install Antrea with the necessary configuration:

helm install -n kube-system antrea antrea/antrea --set featureGates.L7NetworkPolicy=true --set disableTXChecksumOffload=true

Use the following policy:

apiVersion: crd.antrea.io/v1beta1
kind: NetworkPolicy
metadata:
  name: egress-allow-http
spec:
  priority: 5
  tier: application
  appliedTo:
    - podSelector:
        matchLabels:
          app: http-client
  egress:
    - name: allow-http
      action: Allow      # All other traffic to these Pods will be automatically dropped, and subsequent rules will not be considered.
      to:
        - podSelector:
            matchLabels:
              app: http-server
      l7Protocols:
        - http: {}
    - name: drop-other   # Drop all other egress traffic
      action: Drop

For the http-server application, you can use a Deployment running an nginx Pod, exposed by a Service.

http-server Deployment + Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: http-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-server
  template:
    metadata:
      labels:
        app: http-server
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
  name: http-server
spec:
  selector:
    app: http-server
  ports:
    - port: 80
      targetPort: 80

For the http-client application, you can use a Deployment running an antrea/toolbox Pod.

http-client Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: http-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-client
  template:
    metadata:
      labels:
        app: http-client
    spec:
      containers:
      - name: toolbox
        image: antrea/toolbox:latest
        imagePullPolicy: IfNotPresent

After creating everything, try to curl the http-server Service from the http-client Pod. It should hang.
However, if you curl the http-server Pod IP address directly, it will work as expected.

Expected
The policy should work correctly when the http-server application is accessed through the Service.

Actual behavior
The policy only work correctly when the http-server application is accessed directly using the Pod IP.

Versions:
Antrea v2.2.0, and top-of-tree

Additional context
This is the traffic captured on antrea-l7-tap0 (ingress interface for Suricata engine), when accessing the http-server Service.

19:10:44.846442 IP (tos 0x0, ttl 63, id 19449, offset 0, flags [DF], proto TCP (6), length 60)
    10.10.2.16.54062 > 10.10.2.15.80: Flags [S], cksum 0xdb98 (correct), seq 3270148714, win 64860, options [mss 1410,sackOK,TS val 1651039181 ecr 0,nop,wscale 7], length 0
19:10:44.846633 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.96.226.29.80 > 10.10.2.16.54062: Flags [S.], cksum 0xfc4f (correct), seq 3998964513, ack 3270148715, win 64308, options [mss 1410,sackOK,TS val 201309037 ecr 1651035102,nop,wscale 7], length 0
19:10:45.870228 IP (tos 0x0, ttl 63, id 19450, offset 0, flags [DF], proto TCP (6), length 60)
    10.10.2.16.54062 > 10.10.2.15.80: Flags [S], cksum 0xd799 (correct), seq 3270148714, win 64860, options [mss 1410,sackOK,TS val 1651040204 ecr 0,nop,wscale 7], length 0
19:10:45.870501 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.96.226.29.80 > 10.10.2.16.54062: Flags [S.], cksum 0xf84f (correct), seq 3998964513, ack 3270148715, win 64308, options [mss 1410,sackOK,TS val 201310061 ecr 1651035102,nop,wscale 7], length 0
19:10:47.886667 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)

10.10.2.16 is the IP address of the http-client Pod.
10.96.226.29 is the ClusterIP address of the http-server Service.
10.10.2.15 is the IP address of the http-server Pod.

We can see that the client -> server traffic is forwarded to Suricata "post-DNAT" (destination IP is http-server Pod IP). However, the server -> client (reply) traffic appears to be forwarded to Suricata after the source IP has been rewritten pack to the original destination IP (i.e., the ClusterIP). Suricata has no way to identify this reply traffic as part of the same connection. The reply traffic (SYN-ACK in this case) is dropped (I assume by Suricata) and does not show up on antrea-l7-tap0 (egress interface for Suricata engine).

The Antrea datapath should be fixed so that the reply traffic is sent to Suricata prior rewriting the source IP ("un-DNAT").

The text was updated successfully, but these errors were encountered:

antoninbas · 2024-12-11T20:03:07Z

cc @tnqn @hongliangl @luolanzone for visibility
Do you think this could be fixed in the v2.3 timeframe?

hongliangl · 2024-12-12T07:15:28Z

Will take a look and evaluate.

hongliangl · 2024-12-12T11:29:14Z

Currently, we use a CT mark L7NPRedirectCTMark in zone 65520 to identify request and reply packets. Packets with the CT mark will be redirected to Suricata via antrea-l7-tap0.

To redirect reply packets to Suricata, all packets will go to table ConntrackZone to restore state L7NPRedirectCTMark in ct zone 65520. However, Service DNAT is also performed in this zone 65520, causing the packets "un-DNAT". As a result, the reply "un-DNAT" packets are sent to Suricata.

antoninbas · 2024-12-12T18:28:26Z

Do we need to use an additional ct zone for such packets (that need to be sent to Suricata), so we can identify reply packets earlier, or is there another solution?

hongliangl · 2025-01-03T07:05:21Z

I struggled to find a simple and elegant solution to the issue but couldn't. After discussing it with @wenying, we identified a feasible fix. However, in my opinion, this solution is quite complex and may be difficult to explain.

For the packets of a Serivce connection enforced by one L7 NetworkPolicy:

Packet Type	Phase	Inport	Source IP	Destination IP	Note
First request packet	1	pod4	10.10.0.4	10.96.0.1	Initial request packet
	2	pod4	10.10.0.4	10.10.0.5	DNAT performed in `EndpointDNAT`
	3	antrea-l7-tap1	10.10.0.4	10.10.0.5	Returned from Suricata and forwarded to the Pod
Reply packets	1	pod5	10.10.0.5	10.10.0.4	Sent to Suricata via `antrea-l7-tap0` without unDNAT
	2	antrea-l7-tap1	10.10.0.5	10.10.0.4	Returned from Suricata and forwarded to the Pod
	3	antrea-l7-tap1	10.96.0.1	10.10.0.4	unDNAT applied and sent back to the Pod
Subsequent request packets	1	pod4	10.10.0.4	10.96.0.1	Subsequent request packets
	2	pod4	10.10.0.4	10.10.0.5	DNAT with connection tracking, sent to Suricata via `antrea-l7-tap0`
	3	antrea-l7-tap1	10.10.0.4	10.10.0.5	Forwarded to the Pod

In the following context:

The flows in bold matches the current mentioned packets if there are multiple flows listed.
The flow with * are new flows introduced.

First request packet

Phase 1

This is the enhanced flow matching the first request packet in phase 1. There are 2 changes:

Add match condition tcp.
Add a learn action to generate learned flow in a new table 100 to match reply packets in phase 1.

* table=AntreaPolicyEgressRule, priority=65000,conj_id=3, tcp, actions=
learn(table=100,idle_timeout=5, priority=200,delete_learned, cookie=0x203000000000a, eth_type=0x800, nw_proto=6,
NXM_OF_IP_DST[]=NXM_OF_IP_SRC[],
NXM_OF_IP_SRC[]=NXM_OF_IP_DST[],
NXM_OF_TCP_SRC[]=NXM_OF_TCP_DST[],
NXM_OF_TCP_DST[]=NXM_OF_TCP_SRC[],
load:0x1->NXM_NX_REG0[23..24],
load:0x1->NXM_NX_REG8[0..11],
load:0x1->NXM_NX_REG0[21..22]),
load:0x3->NXM_NX_REG5[],
ct(commit,table=EgressMetric,zone=65520,exec(load:0x3->NXM_NX_CT_LABEL[32..63],load:0x1->NXM_NX_CT_MARK[7],load:0x1->NXM_NX_CT_LABEL[64..75]))

Phase 2

For the first request packet in phase 2, it is still sent by the following flow to Suricata port.

* table=Output, priority=400,reg0=0x6/0xf actions=output:NXM_NX_REG1[]

* table=Output, priority=400,reg0=0x800000/0x1800000 actions=push_vlan:0x8100,move:NXM_NX_REG8[0..11]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=212,ct_mark=0x80/0x80,reg0=0x200000/0x600000 actions=push_vlan:0x8100,move:NXM_NX_CT_LABEL[64..75]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=210,ct_mark=0x40/0x40 actions=IN_PORT
table=Output, priority=200,reg0=0x200000/0x600000 actions=output:NXM_NX_REG1[]
table=Output, priority=200,reg0=0x2400000/0xfe600000 actions=meter:256,controller(reason=no_match,id=58487,userdata=01.01)

Phase 3

These flows are to match the first request packet in phase 3 returned from Suricata and forward them to the destination Pod with IP 10.10.0.5.

* table=Classifier, priority=300,in_port="antrea-l7-tap1", vlan_tci=0x1000/0x1000 actions=strip_vlan,load:0x6->NXM_NX_REG0[0..3],load:0x2->NXM_NX_REG0[23..24],goto_table:ConntrackZone

* table=ConntrackZone, priority=400,ip,reg0=0x6/0xf actions=set_field:0x200000/0x600000->reg0,ct(table=L3Forwarding,zone=65520,nat)
* table=ConntrackZone, priority=300,reg0=0/0x1800000 actions=resubmit(,100),resubmit(,ConntrackZone)
* table=ConntrackZone, priority=300,ip,reg0=0x800000/0x1800000 actions=goto_table:Output
* table=ConntrackZone, priority=300,ip,reg0=0x1000000/0x1800000 actions=ct(table=ConntrackState,zone=65520,nat)
table=ConntrackZone, priority=200,ip actions=ct(table=ConntrackState,zone=65520,nat)
table=ConntrackZone, priority=0 actions=goto_table:ConntrackState

* table=Output, priority=400,reg0=0x6/0xf actions=output:NXM_NX_REG1[]
* table=Output, priority=400,reg0=0x800000/0x1800000 actions=push_vlan:0x8100,move:NXM_NX_REG8[0..11]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=212,ct_mark=0x80/0x80,reg0=0x200000/0x600000 actions=push_vlan:0x8100,move:NXM_NX_CT_LABEL[64..75]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=210,ct_mark=0x40/0x40 actions=IN_PORT
table=Output, priority=200,reg0=0x200000/0x600000 actions=output:NXM_NX_REG1[]
table=Output, priority=200,reg0=0x2400000/0xfe600000 actions=meter:256,controller(reason=no_match,id=58487,userdata=01.01)

Reply packets

Phase 1

These flows are used to distinguish the reply packets in phase 1 from all traffic. Some new register marks are introduced:

reg0=0x0/0x1800000, default
reg0=0x800000/0x1800000, reply packets of L7 NetworkPolicy connection
0x1000000/0x1800000, other packets.

At first, all packets are resubmitted to table 100 to load register marks. The the reply packets in phase 1 will be loaded with reg0=0x800000/0x1800000. As a result, the corresponding packets will be forwarded to Output directly to redirect to Suricata without ct action, avoiding unDNAT.

* table=ConntrackZone, priority=400,ip,reg0=0x6/0xf actions=set_field:0x200000/0x600000->reg0,ct(table=L3Forwarding,zone=65520,nat)
* table=ConntrackZone, priority=300,reg0=0/0x1800000 actions=resubmit(,100),resubmit(,ConntrackZone)
* table=ConntrackZone, priority=300,ip,reg0=0x800000/0x1800000 actions=goto_table:Output
* table=ConntrackZone, priority=300,ip,reg0=0x1000000/0x1800000 actions=ct(table=ConntrackState,zone=65520,nat)
table=ConntrackZone, priority=200,ip actions=ct(table=ConntrackState,zone=65520,nat)
table=ConntrackZone, priority=0 actions=goto_table:ConntrackState

* table=Output, priority=400,reg0=0x6/0xf actions=output:NXM_NX_REG1[]
* table=Output, priority=400,reg0=0x800000/0x1800000 actions=push_vlan:0x8100,move:NXM_NX_REG8[0..11]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=212,ct_mark=0x80/0x80,reg0=0x200000/0x600000 actions=push_vlan:0x8100,move:NXM_NX_CT_LABEL[64..75]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=210,ct_mark=0x40/0x40 actions=IN_PORT
table=Output, priority=200,reg0=0x200000/0x600000 actions=output:NXM_NX_REG1[]
table=Output, priority=200,reg0=0x2400000/0xfe600000 actions=meter:256,controller(reason=no_match,id=58487,userdata=01.01)

* table=100, priority=200,tcp,nw_src=10.10.0.5,nw_dst=10.10.0.4,tp_src=80,tp_dst=52994 actions=set_field:0x800000/0x1800000->reg0,set_field:0x1/0xfff->reg8,set_field:0x200000/0x600000->reg0
* table=100, priority=0 actions=set_field:0x1000000/0x1800000->reg0

Phase 2

Similar to the first request packet in phase 3, these flows are also used to match the reply packets in phase 2. With the ct action, the packets will be unDNATed, transmitting into reply packets in phase 3,

* table=Classifier, priority=300,in_port="antrea-l7-tap1", vlan_tci=0x1000/0x1000 actions=strip_vlan,load:0x6->NXM_NX_REG0[0..3],load:0x2->NXM_NX_REG0[23..24],goto_table:ConntrackZone

* table=ConntrackZone, priority=400,ip,reg0=0x6/0xf actions=set_field:0x200000/0x600000->reg0,ct(table=L3Forwarding,zone=65520,nat)
* table=ConntrackZone, priority=300,reg0=0/0x1800000 actions=resubmit(,100),resubmit(,ConntrackZone)
* table=ConntrackZone, priority=300,ip,reg0=0x800000/0x1800000 actions=goto_table:Output
* table=ConntrackZone, priority=300,ip,reg0=0x1000000/0x1800000 actions=ct(table=ConntrackState,zone=65520,nat)
table=ConntrackZone, priority=200,ip actions=ct(table=ConntrackState,zone=65520,nat)
table=ConntrackZone, priority=0 actions=goto_table:ConntrackState

Phase 3

The reply packets in phase 3 are forwarded to the Pod with 10.10.0.4.

* table=Output, priority=400,reg0=0x6/0xf actions=output:NXM_NX_REG1[]
* table=Output, priority=400,reg0=0x800000/0x1800000 actions=push_vlan:0x8100,move:NXM_NX_REG8[0..11]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=212,ct_mark=0x80/0x80,reg0=0x200000/0x600000 actions=push_vlan:0x8100,move:NXM_NX_CT_LABEL[64..75]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=210,ct_mark=0x40/0x40 actions=IN_PORT
table=Output, priority=200,reg0=0x200000/0x600000 actions=output:NXM_NX_REG1[]
table=Output, priority=200,reg0=0x2400000/0xfe600000 actions=meter:256,controller(reason=no_match,id=58487,userdata=01.01)

Subsequent request packets

Phase 1

These flows are used to match the subsequent request packets in phase 1 to make sure the the packets are DNATed correcly, transmitting into the subsequent request packets in phase 2 as well restoring ct state.

* table=ConntrackZone, priority=300,reg0=0x0/0x1800000 actions=resubmit(,100),resubmit(,ConntrackZone)
* table=ConntrackZone, priority=300,reg0=0x800000/0x1800000,ip, actions=goto_table:Output
* **table=ConntrackZone, priority=300,reg0=0x1000000/0x1800000,ip, actions=ct(table=ConntrackState,zone=65520,nat)**table=ConntrackZone, priority=200,ip actions=ct(table=ConntrackState,zone=65520,nat)
table=ConntrackZone, priority=0 actions=goto_table:ConntrackState

* table=100, priority=200,tcp,nw_src=10.10.0.5,nw_dst=10.10.0.4,tp_src=80,tp_dst=52994 actions=set_field:0x800000/0x1800000->reg0,set_field:0x1/0xfff->reg8,set_field:0x200000/0x600000->reg0
* table=100, priority=0 actions=set_field:0x1000000/0x1800000->reg0

Phase 2

The subsequent request packets in phase 2 will be redirected to Suricata with the flow since the ct_mark=0x80/0x80 as well as ct_label is restored.

* table=Output, priority=400,reg0=0x6/0xf actions=output:NXM_NX_REG1[]
* table=Output, priority=400,reg0=0x800000/0x1800000 actions=push_vlan:0x8100,move:NXM_NX_REG8[0..11]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=212,ct_mark=0x80/0x80,reg0=0x200000/0x600000 actions=push_vlan:0x8100,move:NXM_NX_CT_LABEL[64..75]->OXM_OF_VLAN_VID[],output:1
table=Output, priority=210,ct_mark=0x40/0x40 actions=IN_PORT
table=Output, priority=200,reg0=0x200000/0x600000 actions=output:NXM_NX_REG1[]
table=Output, priority=200,reg0=0x2400000/0xfe600000 actions=meter:256,controller(reason=no_match,id=58487,userdata=01.01)

Phase 3

The processs of subsequent request packets in phase 3 is the same as the first request packet in phase 3.

@antoninbas @tnqn @luolanzone

hongliangl · 2025-01-03T07:10:33Z

Do we need to use an additional ct zone for such packets (that need to be sent to Suricata), so we can identify reply packets earlier, or is there another solution?

That should be a simple way to fix the issue. I don't have a concrete idea about how to implement this, but it may case side effect to other connections not enforced by L7 NetworkPolicies, degrading the performance due to a new introduced ct zone.

antoninbas added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/network-policy Issues or PRs related to network policies. labels Dec 11, 2024

luolanzone added this to the Antrea v2.3 release milestone Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antrea L7NetworkPolicies do not handle Service traffic correctly #6854

Antrea L7NetworkPolicies do not handle Service traffic correctly #6854

antoninbas commented Dec 11, 2024

antoninbas commented Dec 11, 2024

hongliangl commented Dec 12, 2024

hongliangl commented Dec 12, 2024 •

edited

Loading

antoninbas commented Dec 12, 2024

hongliangl commented Jan 3, 2025 •

edited

Loading

hongliangl commented Jan 3, 2025

Antrea L7NetworkPolicies do not handle Service traffic correctly #6854

Antrea L7NetworkPolicies do not handle Service traffic correctly #6854

Comments

antoninbas commented Dec 11, 2024

antoninbas commented Dec 11, 2024

hongliangl commented Dec 12, 2024

hongliangl commented Dec 12, 2024 • edited Loading

antoninbas commented Dec 12, 2024

hongliangl commented Jan 3, 2025 • edited Loading

First request packet

Phase 1

Phase 2

Phase 3

Reply packets

Phase 1

Phase 2

Phase 3

Subsequent request packets

Phase 1

Phase 2

Phase 3

hongliangl commented Jan 3, 2025

hongliangl commented Dec 12, 2024 •

edited

Loading

hongliangl commented Jan 3, 2025 •

edited

Loading