-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The example application DpdkTrafficFilter blocks forever #1275
Comments
To add: after "blocking", in the following code dev->receivePackets() always return 0:
while it's a busy communication line and packets are always in. Just curious why sendPacketsTo->sendPacket() would cause dev->receivePackets() to behave incorrectly - they are different ports (-d 0 -s 1). |
@siphonelee it's hard to say why sending packets blocks all traffic (incoming and outgoing), you probably need to debug it further and provide more information... maybe the type of packets being sent outside causes the network to stop sending packets to this machine? Maybe you can try loading a packet from a pcap file that is not related to the incoming packets, then send this one packet any time a packet is being received and see if traffic still gets blocked at some point? If you don't send any packets, does incoming traffic still gets blocked? |
@seladb Thank you for the reply. If no packet was sent out, incoming traffic would not be blocked. Even in sending packet scenario, the first few ones can be sent out successfully - sendPacket() returns 1 for each packet sent, and the sent-out packet can be captured in wire. But after that, dev->receivePackets() starts to return 0. |
@seladb As you suggested, I tried to send a same packet each time the incoming packet matched with a certain IP address. The result was the same: RX got blocked after several TX packets. Another observation was: I added code to call rte_eth_stats_get() and print out stats when program ends. When blocking happened, I could see a large rx_nombuf statistics; while if I quit the program earlier before blocking happened, the rx_nombuf statistics would always be 0. That seemed to imply a real "blocking". |
Thanks for debugging it @siphonelee ! The Here are a few things you can check that might help us debug the issue:
|
@seladb Here are the debugging results per your request:
--- The return value is always true before blocking.
--- The object type is always 1 before blocking.
--- Just as you figured out, there seems to be a mbuf leakage hidding. I used:
for the two scenarios:
|
I think you uncovered a bug that has been there for a long time 😄 I think it was introduced in this commit (almost 6 years ago): 29a4db4 For some reason I assumed that calling Here is the line where it happens: PcapPlusPlus/Pcap++/src/DpdkDevice.cpp Line 1201 in 96987d2
I didn't test this theory, and I also don't remember why I added this logic to avoid freeing the mbuf in this case, maybe there was a reason I don't remember 🤔 A simple fix can be removing this logic and seeing if the mbuf leak is gone. However I need to run more test to try and figure out why I added it in the first place... |
@seladb Thanks for you quick reply, that's exactly the change I made and tested for a while. It works fine now. I'm sure you had a valid reason for the code, since I found the similar logic in several other places: PcapPlusPlus/Pcap++/src/DpdkDevice.cpp Lines 1093 to 1097 in 96987d2
PcapPlusPlus/Pcap++/src/DpdkDevice.cpp Lines 1134 to 1136 in 96987d2
PcapPlusPlus/Pcap++/src/DpdkDevice.cpp Lines 1175 to 1178 in 96987d2
PcapPlusPlus/Pcap++/src/DpdkDevice.cpp Lines 1187 to 1190 in 96987d2
PcapPlusPlus/Pcap++/src/DpdkDevice.cpp Lines 1210 to 1212 in 96987d2
PcapPlusPlus/Pcap++/src/DpdkDevice.cpp Lines 1220 to 1221 in 96987d2
According to DPDK document you referenced: In my previous use case I sent packets only when certain conditions were meet, during the interval of which a lot of packets had been received since it's a busy line. From the observations above my guess is the non-freeed (few) tx mbuf somehow blocks allocation of rx mbuf. But I can't figure out why. Please share it if you have an idea, thanks. |
Thanks for looking into it @siphonelee ! If I understand DPDK documentation correctly, I wonder if we always free the mbuf manually after successful TX (unlike what the logic currently does), will everything work as expected? Meaning - in both cases where (1) packets are immediately sent after receiving, or (2) packets are sent only when certain conditions are met - can DPDK handle freeing an already free mbuf - or will it fail? Can you please check this? If it doesn't cause issues, maybe this is the change we need to make. Please let me know what you think |
--- I'm not an expert of DPDK, but here are my five cents: tx descriptors are used by nic in a ring-buffered way, and they refer to mbufs which contain actual packet data and metadata. Take i40e as an example, the related driver code I found is:
--- Always freeing mbufs manually seems work as expected, and I'll test further. I guess the auto-free-mbuf mechanism of DPDK would be more performant since it's done in a bulk style, but may cause the blocking issue I met.
-- I agree. |
That sounds good @siphonelee , will you consider opening a PR with the fix? You already have a setup where you can test it and I currently don't... |
@seladb Sure. I just need more time for more testing. |
@siphonelee do you think you'll have some time to work on it soon? It'd be great if we could fix it, and you have an environment to test it (which I currently don't...) |
@seladb Apologies for getting back to you late. I've been testing my fix recently in production environment, and there still exists a segment fault issue occuring occasionally and hard to reproduce: PcapPlusPlus/Pcap++/src/MBufRawPacket.cpp Lines 318 to 319 in 6d156bf
What GDB reports is: Thread 6 "lcore-worker-2" received signal SIGSEGV, Segmentation fault. My fix is simply changing: Your advice is appreciated. |
First of all, thanks for the great work of DpdkTrafficFilter! It saves me a lot of effort in developing a high performance application using dpdk.
I'm running the example DpdkTrafficFilter application to filter DNS request like this:
./DpdkTrafficFilter -d 0 -s 1 -P 53 -r 1 -t 1 -c 3 -m 65535 -i (a certain IP address)
it outputs info like this:
EAL: Detected CPU lcores: 32
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: net_e1000_igb (8086:1521) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_e1000_igb (8086:1521) device: 0000:01:00.3 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
Opened device #0 with 1 RX queues and 1 TX queues. RSS hash functions:
RSS_IPV4
RSS_IPV6
Using core 1
Core configuration:
DPDK device#0: RX-Queue#0;
and a lot of error info which seems to be related to invalid DNS packet format (yet harmless I assume):
[ERROR: ...Packet++/src/DnsLayer.cpp: parseResources:156] DNS layer contains more than 300 resources, probably a bad packet. Skipping parsing DNS resources
I did see some packets are sent out from port 1 by packet capturing. But after a quick while, the application seems blocked - no more packets sent out from port 1, and no more error messages printed out like above.
If I comment out the packet-sending code in DpdkTrafficFilter, rebuild and run the application again, everything seems working fine:
The issue happens on both Pcap++ v22.11 and v23.09.
Could anyone please provide some insights? Thanks!
Environment:
Ubuntu 22.04
DPDK stable 21.11
igb_uio kernel module
NIC intel i350
1GB x 16 huge pages
BTW, I can run dpdk example application l2fwd to send packets without any issue, so I assume dpdk is ok.
The text was updated successfully, but these errors were encountered: