Skip to content
This repository has been archived by the owner on Aug 26, 2024. It is now read-only.

examples/sendudp stuck at Poll() after a while #9

Open
hujun-open opened this issue May 19, 2021 · 3 comments
Open

examples/sendudp stuck at Poll() after a while #9

hujun-open opened this issue May 19, 2021 · 3 comments
Assignees

Comments

@hujun-open
Copy link
Contributor

Hi,
I tried running examples/sendudp in a virtualbox VM running Ubuntu 20.04 (on a windows 10 host), using command "sendudp -interface vA", vA is a veth interface; the program always stop sending pkt after a while like following:

260096 packets/s (3000 Mb/s)
317312 packets/s (3660 Mb/s)
339456 packets/s (3915 Mb/s)
251456 packets/s (2900 Mb/s)
281280 packets/s (3244 Mb/s)
251200 packets/s (2897 Mb/s)
252480 packets/s (2912 Mb/s)
262976 packets/s (3033 Mb/s)
256000 packets/s (2953 Mb/s)
255232 packets/s (2944 Mb/s)
251072 packets/s (2896 Mb/s)
243456 packets/s (2808 Mb/s)
276800 packets/s (3193 Mb/s)
240832 packets/s (2778 Mb/s)
246976 packets/s (2849 Mb/s)
259136 packets/s (2989 Mb/s)
246848 packets/s (2847 Mb/s)
246912 packets/s (2848 Mb/s)
249408 packets/s (2877 Mb/s)
244672 packets/s (2822 Mb/s)
246400 packets/s (2842 Mb/s)
240448 packets/s (2773 Mb/s)
243968 packets/s (2814 Mb/s)
248960 packets/s (2872 Mb/s)
248768 packets/s (2869 Mb/s)
235392 packets/s (2715 Mb/s)
248320 packets/s (2864 Mb/s)
244544 packets/s (2821 Mb/s)
237888 packets/s (2744 Mb/s)
253760 packets/s (2927 Mb/s)
60336 packets/s (696 Mb/s)
0 packets/s (0 Mb/s)
0 packets/s (0 Mb/s)
0 packets/s (0 Mb/s)
0 packets/s (0 Mb/s)

Then I added some codes to run pprof, and found the program stuck at line "_, , err = xsk.Poll(-1)", then I changed the timeout to a positive value like ", _, err = xsk.Poll(1000)", but it still stuck at the same place ....

@slavc slavc self-assigned this May 20, 2021
@hujun-open
Copy link
Contributor Author

it seems setting "Paravirtualization Interface" to "Minimal" in virutalbox VM setting solves the problem, however the VM performance is greatly impacted with such setting;
also running the example in a qemu/kvm VM on a linux host seems fine, so I guess this issue is somehow caused by virtualbox;

@slavc
Copy link
Collaborator

slavc commented Jun 25, 2021

I can confirm I see something very similar as well on Ubuntu 20.04, Linux kernel 5.8.0-55-generic, running natively.

  1. Set up a pair of veths:
ip link add veth0 type veth peer veth1
ip addr add 10.0.0.1/24 dev veth0
ip addr add 10.0.0.2/24 dev veth1
ip link set up dev veth0
ip link set up dev veth1
  1. Run sendudp example on veth0:
~/go/bin/sendudp -interface veth0 -dstip 10.0.0.2
  1. It seems to get stuck immediately and isn't sending any packets.

Running tcpdump on veth1, I see exactly 16 packets come through:

tcpdump -n -l -i veth1 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:23:47.512093 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512118 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512143 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512146 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512149 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512152 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512154 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512157 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512159 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512162 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512164 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512167 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512185 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512187 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512190 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400
14:23:47.512192 IP 192.168.111.10.1234 > 10.0.0.2.1234: UDP, length 1400

16 happens to be the maximum number of packets that will be attempted to be sent per "kernel kick", i.e. by the sendmsg(2) call on XDP socket, by the kernel: https://github.com/gregkh/linux/blob/v5.8/net/xdp/xsk.c#L332

By modifying sendudp to print full xsk.Stats():

diff --git a/examples/sendudp/sendudp.go b/examples/sendudp/sendudp.go
index c193c68..45938aa 100644
--- a/examples/sendudp/sendudp.go
+++ b/examples/sendudp/sendudp.go
@@ -7,6 +7,7 @@ package main
 
 import (
        "encoding/hex"
+       "encoding/json"
        "flag"
        "fmt"
        "math"
@@ -110,19 +111,14 @@ func main() {
        fmt.Printf("sending UDP packets from %v (%v) to %v (%v)...\n", ip.SrcIP, eth.SrcMAC, ip.DstIP, eth.DstMAC)
 
        go func() {
-               var err error
-               var prev xdp.Stats
-               var cur xdp.Stats
-               var numPkts uint64
                for i := uint64(0); ; i++ {
                        time.Sleep(time.Duration(1) * time.Second)
-                       cur, err = xsk.Stats()
+                       cur, err := xsk.Stats()
                        if err != nil {
                                panic(err)
                        }
-                       numPkts = cur.Completed - prev.Completed
-                       fmt.Printf("%d packets/s (%d Mb/s)\n", numPkts, (numPkts*uint64(frameLen)*8)/(1000*1000))
-                       prev = cur
+                       s, _ := json.Marshal(cur)
+                       fmt.Printf("%s\n", string(s))
                }
        }()

It doesn't seem like kernel ever reports those transmitted descriptors as completed:

~/go/bin/sendudp -interface veth0 -dstip 10.0.0.2
sending UDP packets from 192.168.111.10 (b2:96:81:75:b2:11) to 10.0.0.2 (ff:ff:ff:ff:ff:ff)...
{"Filled":0,"Received":0,"Transmitted":0,"Completed":0,"KernelStats":{"Rx_dropped":0,"Rx_invalid_descs":0,"Tx_invalid_descs":0,"Rx_ring_full":0,"Rx_fill_ring_empty_descs":0,"Tx_ring_empty_descs":0}}
{"Filled":0,"Received":0,"Transmitted":0,"Completed":0,"KernelStats":{"Rx_dropped":0,"Rx_invalid_descs":0,"Tx_invalid_descs":0,"Rx_ring_full":0,"Rx_fill_ring_empty_descs":0,"Tx_ring_empty_descs":0}}
{"Filled":0,"Received":0,"Transmitted":0,"Completed":0,"KernelStats":{"Rx_dropped":0,"Rx_invalid_descs":0,"Tx_invalid_descs":0,"Rx_ring_full":0,"Rx_fill_ring_empty_descs":0,"Tx_ring_empty_descs":0}}
{"Filled":0,"Received":0,"Transmitted":0,"Completed":0,"KernelStats":{"Rx_dropped":0,"Rx_invalid_descs":0,"Tx_invalid_descs":0,"Rx_ring_full":0,"Rx_fill_ring_empty_descs":0,"Tx_ring_empty_descs":0}}
{"Filled":0,"Received":0,"Transmitted":0,"Completed":0,"KernelStats":{"Rx_dropped":0,"Rx_invalid_descs":0,"Tx_invalid_descs":0,"Rx_ring_full":0,"Rx_fill_ring_empty_descs":0,"Tx_ring_empty_descs":0}}
^C

The way they should be reported as completed is by skb->destructor getting called when the skb has been "sent": https://github.com/gregkh/linux/blob/v5.8/net/xdp/xsk.c#L376

BUT, this issue only seems to happen on virtual interfaces such as veth, I've never seen this issue happen on any "real" interfaces.

So it could be some bug in the way we interact with the kernel, or, less likely, could be some problem in the kernel; we should probably try reproducing on latest kernel.

@hujun-open
Copy link
Contributor Author

So far, I only experienced this issue in a virtualbox VM; there is no issuing running sendudp on veth interface natively on my linux server (ubuntu 20.04 kernel 5.4.0-74-generic, CPU Intel E3-1230 V2)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants