4/11/2023 0 Comments Netmap pci![]() ![]() While the techniques listed above require taking over a whole NIC, there are alternatives. In other circumstances dedicating a NIC to bypass would be acceptable. Having said that, many people use the techniques above. At CloudFlare, we simply can't afford to dedicate the whole NIC to a single offloaded application. In order to achieve a kernel bypass all of the remaining techniques: Snabbswitch, DPDK and netmap take over the whole network card, not allowing any traffic on that NIC to reach the kernel. We've already ruled out two techniques, but unfortunately for our workloads none of the remaining solutions is acceptable either! Similarly, plain PF_RING without ZC modules is unattractive since its main goal is to speed up libpcap. It doesn't take over the packets - it's just a fast interface for packet sniffing. Since the goal of kernel bypass is to spare the kernel from processing packets, we can rule out packet_mmap. The main benefit of the added complexity is a nicely documented, vendor-agnostic and clean API. To integrate with networking hardware it requires users to patch the kernel network drivers. Netmap is also a rich framework, but as opposed to UIO techniques it is implemented as a couple of kernel modules. It's similar to snabbswitch in spirit, since it's a full framework and relies on UIO. DPDKĭPDK is a networking framework written in C, created especially for Intel chips. This allows for very fast operation, but it means the packets completely skip the kernel network stack. It's done on a PCI device level with a form of userspace IO (UIO), by mmaping the device registers with sysfs. It works by completely taking over a network card, and implements a hardware driver in userspace. Snabbswitch is a networking framework in Lua mostly geared towards writing L2 applications. Since the kernel is the slow part this ensures the fastest operation. With ZC drivers and transparent_mode=2 the packets will only be delivered to the PF_RING client, and not the kernel network stack. Unlike packet_mmap, PF_RING is not in the mainline kernel and requires special modules. PF_RING is another known technique that intends to speed up packet capture. While it's not strictly a kernel bypass technique, it requires a special place on the list - it's already available in vanilla kernels. Packet_mmap is a Linux API for fast packet sniffing. Here is a list of the most widely known kernel bypass techniques. Unfortunately these techniques are in total flux and a single widely adopted approach hasn't emerged yet. The most common techniques involve creating specialized API's to aid with receiving packets from the hardware at very high speed. Over the years there had been many attempts to address them. The performance limitations of the Linux kernel network are nothing new. Even optimistically assuming the performance won't drop further when adding more cores, we would still need more than 20 CPU's to handle packets at line rate. Let's see the numbers when we direct packets to four RX queues: $ sudo ethtool -X eth2 weight 1 1 1 1 When the packets hit many cores the numbers drop sharply. ![]() Processing 1.4M pps on a single core is certainly a very good result, but unfortunately the stack doesn't scale. As we can see the kernel is able to process 1.4M pps on that queue with a single CPU. By manipulating an indirection table on a NIC with ethtool -X, we direct all the packets to RX queue #0. To my knowledge the fastest way to drop packets in Linux, without hacking the kernel sources, is by placing a DROP rule in the PREROUTING iptables chain: $ sudo iptables -t raw -I PREROUTING -p udp -dport 4321 -dst 192.168.254.1 -j DROPĮthtool statistics above show that the network card receives a line rate of 12M packets per second. Passing packets to userspace is costly, so instead let's try to drop them as soon as they leave the network driver code. Let's see how many packets can be handled by the kernel under perfect conditions. Let's prepare a small experiment to convince you that working around Linux is indeed necessary. This is called a "kernel bypass" and in this article we'll dig into various ways of achieving it. It's apparent that the only way to squeeze more packets from our hardware is by working around the Linux kernel networking stack. Modern 10Gbps NIC's can usually process at least 10M pps. This is not enough in our environment, especially since the network cards are capable of handling a much higher throughput. For example, here at CloudFlare, we are constantly dealing with large packet floods. Unfortunately the speed of vanilla Linux kernel networking is not sufficient for more specialized workloads. ![]() We did the experiments on Linux and the performance was very good considering it's a general purpose operating system. In two previous posts we've discussed how to receive 1M UDP packets per second and how to reduce the round trip time. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |