XDP is the future of high speed networking in the Linux Kernel. What's amazing about XDP is how accessible it is compared to user space kernel bypass. Literally anyone can write an eBPF program, and it even sanity checks it for you! Very excited to see all the amazing work FB puts into eBPF. I've used BCC extensively and it's amazing the granularity you have over resource consumption with eBPF.
Also it's really cool how much FB has put into butterFS and cgroups. They're doing very foundational work for the container space, which is very cool.
XDP = eXpress Data Path, it's an eBPF program that runs before the kernel network stack and allows you to process raw packets as fast as your network card will allow.
eBPF = extended Berkley Packet Filter is a BPF program that is compiled and run on a virtual machine in the kernel (allows you to run kernel code from userspace). It can talk to user space through maps and hook in to various parts of the kernel. An important point is that once compiled, an eBPF program is guaranteed to halt and has other verification performed on it, to make it safe to run in the kernel.
BCC = BPF Compiler Collection, is a set of tools for working with eBPF, it uses llvm and clang to make it easy to write eBPF programs in Python, Rust, etc.
butterFS = btrfs, the filesystem. People often call it butterFS in conversation, even though the btr stands for b-tree.
cgroups = Technically the work is on cgroup2, which changes the way processes are laid out in a resource hierarchy from the original cgroups. This is how resource constraints are placed on containers, although many people use it just to monitor processes (without constraining resources).
> butterFS = btrfs, the filesystem. People often call it butterFS in conversation, even though the btr stands for b-tree.
This isn't a common spelling for Btrfs, and just added confusion for me since I didn't immediately follow the reference in your original comment (I assumed it was something new). It may be a common spoken pronunciation but writing 'butterFS' in a forum comment is more characters/keystrokes than 'btrfs', so I'd consider it a typo/error to be corrected rather than a technical abbreviation to be defined.
> An important point is that once compiled, an eBPF program is guaranteed to halt …
That's made me curious. Got any links or search terms I can use to learn more about that?
I'm guessing nobody has solved the halting problem, I wonder what constraints are? Is the eBPF programming language not Turning Complete? Are the inputs bound in a way that means they don't need a general solution to the halting problem? Does the eBPF program get compiled with a killswitch to guarantee halting?
> There are inherent security and stability risks with allowing user-space code to run inside the kernel. So, a number of checks are performed on every eBPF program before it is loaded. The first test ensures that the eBPF program terminates and does not contain any loops that could cause the kernel to lock up. This is checked by doing a depth-first search of the program's control flow graph (CFG). Unreachable instructions are strictly prohibited; any program that contains unreachable instructions will fail to load.
external Border Gateway Protocol. BGP is a routing protocol used to share reachability information thru independent routing domains; iBGP is when you use BGP to manage your own networks. eBGP is what is spoken between different AS (Autonomous Systems) across the core (routers without a default route) internet. You'll often read "Country X lost 50% of traffic for N hours due to eBGP issues".
cgroups aren't for manipulating kernel parameters. It's for setting resource limits on pids, and retrieving information about resource usage of a pid or group of pids.
Disclaimer: I am not an expert in this, so any corrections are welcome. But here's my intuition.
XDP = eXpress Data Path. It is a new packet processing mechanism in the Linux kernel, which is in some ways an answer to DPDK and other userspace networking frameworks that skip the kernel in pursuit of high performance. It was originally proposed by Cloudflare, when they achieved poor scalability (in terms of packets per second) for something as simple as a packet drop rule in the kernel. The principle behind XDP is to leverage packet processing rules as early as possible in the packet processing pipeline (no wasted work). However, only certain types of rules are simple enough to be done in a high performance way -- complex rules would still be left to netfilter / ebtables.
The rules which XDP leverages, called extended Berkeley Packet Filters (eBPF) are a new take on an old technology. eBPF is a mechanism that allows userspace BPF rules to be inserted on-the-fly into the kernel. Essentially, matching rules which meet certain simplicity requirements (e.g. loop free) can be compiled into a bytecode that is executed by the kernel in a very efficient way. This is an extremely flexible technology, and one domain which it is well suited for is packet processing. BCC is just the set of compiler tools for creating your own eBPF bytecode.
Afaik, the original idea of XDP was discussed among a few kernel networking hackers at a netdev conference and very early prototype was done by Plumgrid back then. Cloudflare is also deploying it in production and have blogged about it as well though that happened a bit later: https://blog.cloudflare.com/how-to-drop-10-million-packets/
This sentence is not quite correct: "However, only certain types of rules are simple enough to be done in a high performance way -- complex rules would still be left to netfilter / ebtables." Under high packet load, netfilter will simply not be able to keep up. The rules that can be written in eBPF with the help of LLVM's eBPF backend are quite complex, for example, Facebook has written their Katran load balancer in eBPF: https://code.fb.com/open-source/open-sourcing-katran-a-scala... . Google folks harden the network stacks receive path with XDP as "big red button" to stop malicious packets: http://vger.kernel.org/netconf2017_files/rx_hardening_and_ud...
Recently Intel developers have added AF_XDP with zero-copy mode which gets pretty close to DPDK: https://www.dpdk.org/wp-content/uploads/sites/35/2018/10/pm-... The goal is that DPDK would only need to rely on AF_XDP and doesn't have the burden to maintain their own user space drivers anymore such that they can be consolidated in the kernel while retaining performance of DPDK.
Writing EBPF programs is still quite involved, especially if you want to distribute them on other systems (as currently the features depend on the Kernel version so the programs are not very portable yet).
Tools like bcc make the process easier though, but require additional tooling on the system (e.g. LLVM).
So from your perspective XDP is better the DPDK and will be the leading userspace network stack in years to come, is that correct? Any chance you can tell me why?
I took that sentence idiomatically, to mean something like "the obstacles have been reduced by at least an order of magnitude", since it's never literally true that literally anyone can write some software.
Also it's really cool how much FB has put into butterFS and cgroups. They're doing very foundational work for the container space, which is very cool.