XDP is the future of high speed networking in the Linux Kernel. What's amazing a...

_qfi9 · on Oct 30, 2018

As a (semi casual) linux user but "kernel outsider" anyone want to break down all the acronyms used here?

ilovecaching · on Oct 30, 2018

FB = Facebook :)

XDP = eXpress Data Path, it's an eBPF program that runs before the kernel network stack and allows you to process raw packets as fast as your network card will allow.

eBPF = extended Berkley Packet Filter is a BPF program that is compiled and run on a virtual machine in the kernel (allows you to run kernel code from userspace). It can talk to user space through maps and hook in to various parts of the kernel. An important point is that once compiled, an eBPF program is guaranteed to halt and has other verification performed on it, to make it safe to run in the kernel.

BCC = BPF Compiler Collection, is a set of tools for working with eBPF, it uses llvm and clang to make it easy to write eBPF programs in Python, Rust, etc.

butterFS = btrfs, the filesystem. People often call it butterFS in conversation, even though the btr stands for b-tree.

cgroups = Technically the work is on cgroup2, which changes the way processes are laid out in a resource hierarchy from the original cgroups. This is how resource constraints are placed on containers, although many people use it just to monitor processes (without constraining resources).

wgjordan · on Oct 30, 2018

> butterFS = btrfs, the filesystem. People often call it butterFS in conversation, even though the btr stands for b-tree.

This isn't a common spelling for Btrfs, and just added confusion for me since I didn't immediately follow the reference in your original comment (I assumed it was something new). It may be a common spoken pronunciation but writing 'butterFS' in a forum comment is more characters/keystrokes than 'btrfs', so I'd consider it a typo/error to be corrected rather than a technical abbreviation to be defined.

ilovecaching · on Oct 30, 2018

I did it because I think it's funny.

Zhyl · on Oct 30, 2018

I can't believe it's not butterFS

type0 · on Oct 31, 2018

I usually pronounce it as betterFS, which perfectly describes it :-)

bigiain · on Oct 30, 2018

> An important point is that once compiled, an eBPF program is guaranteed to halt …

That's made me curious. Got any links or search terms I can use to learn more about that?

I'm guessing nobody has solved the halting problem, I wonder what constraints are? Is the eBPF programming language not Turning Complete? Are the inputs bound in a way that means they don't need a general solution to the halting problem? Does the eBPF program get compiled with a killswitch to guarantee halting?

teej · on Oct 30, 2018

From elsewhere in the thread I found this link https://lwn.net/Articles/740157/

> There are inherent security and stability risks with allowing user-space code to run inside the kernel. So, a number of checks are performed on every eBPF program before it is loaded. The first test ensures that the eBPF program terminates and does not contain any loops that could cause the kernel to lock up. This is checked by doing a depth-first search of the program's control flow graph (CFG). Unreachable instructions are strictly prohibited; any program that contains unreachable instructions will fail to load.

bigiain · on Oct 31, 2018

Thanks!

jtbayly · on Oct 30, 2018

Ok, so what does eBGP stand for? (By the way, thanks for the other definitions!)

lanstin · on Oct 30, 2018

external Border Gateway Protocol. BGP is a routing protocol used to share reachability information thru independent routing domains; iBGP is when you use BGP to manage your own networks. eBGP is what is spoken between different AS (Autonomous Systems) across the core (routers without a default route) internet. You'll often read "Country X lost 50% of traffic for N hours due to eBGP issues".

coolspot · on Oct 30, 2018

In this context it was a typo (now corrected).

Shalle135 · on Oct 30, 2018

What’s the difference between using kernel parameters within sysctl.conf and cgroup2?

ilovecaching · on Oct 30, 2018

cgroups aren't for manipulating kernel parameters. It's for setting resource limits on pids, and retrieving information about resource usage of a pid or group of pids.

doctorsher · on Oct 30, 2018

Disclaimer: I am not an expert in this, so any corrections are welcome. But here's my intuition.

XDP = eXpress Data Path. It is a new packet processing mechanism in the Linux kernel, which is in some ways an answer to DPDK and other userspace networking frameworks that skip the kernel in pursuit of high performance. It was originally proposed by Cloudflare, when they achieved poor scalability (in terms of packets per second) for something as simple as a packet drop rule in the kernel. The principle behind XDP is to leverage packet processing rules as early as possible in the packet processing pipeline (no wasted work). However, only certain types of rules are simple enough to be done in a high performance way -- complex rules would still be left to netfilter / ebtables.

The rules which XDP leverages, called extended Berkeley Packet Filters (eBPF) are a new take on an old technology. eBPF is a mechanism that allows userspace BPF rules to be inserted on-the-fly into the kernel. Essentially, matching rules which meet certain simplicity requirements (e.g. loop free) can be compiled into a bytecode that is executed by the kernel in a very efficient way. This is an extremely flexible technology, and one domain which it is well suited for is packet processing. BCC is just the set of compiler tools for creating your own eBPF bytecode.

gggggggggre2 · on Oct 30, 2018

Here's some more info in the BPF and XDP reference guide on concepts, use cases and getting started examples to catch up: https://cilium.readthedocs.io/en/latest/bpf/

Afaik, the original idea of XDP was discussed among a few kernel networking hackers at a netdev conference and very early prototype was done by Plumgrid back then. Cloudflare is also deploying it in production and have blogged about it as well though that happened a bit later: https://blog.cloudflare.com/how-to-drop-10-million-packets/

This sentence is not quite correct: "However, only certain types of rules are simple enough to be done in a high performance way -- complex rules would still be left to netfilter / ebtables." Under high packet load, netfilter will simply not be able to keep up. The rules that can be written in eBPF with the help of LLVM's eBPF backend are quite complex, for example, Facebook has written their Katran load balancer in eBPF: https://code.fb.com/open-source/open-sourcing-katran-a-scala... . Google folks harden the network stacks receive path with XDP as "big red button" to stop malicious packets: http://vger.kernel.org/netconf2017_files/rx_hardening_and_ud...

Recently Intel developers have added AF_XDP with zero-copy mode which gets pretty close to DPDK: https://www.dpdk.org/wp-content/uploads/sites/35/2018/10/pm-... The goal is that DPDK would only need to rely on AF_XDP and doesn't have the burden to maintain their own user space drivers anymore such that they can be consolidated in the kernel while retaining performance of DPDK.

Definitely exciting times ahead! :-)

doctorsher · on Oct 30, 2018

Thank you for the insight! Your post adds helpful context / corrections. Very exciting times, indeed! :)

lbotos · on Oct 30, 2018

I can get you a couple:

eBPF is a new kernel "tool": https://qmonnet.github.io/whirl-offload/2016/09/01/dive-into...

BCC is built on eBPF: https://github.com/iovisor/bcc

It's really cool new debugging and analysis stuff for the Linux Kernel. I'm asking my team to learn it ASAP.

bscphil · on Oct 30, 2018

By butterFS do you mean btrfs? I know the former only as a pronunciation of the latter.

copperx · on Oct 30, 2018

b-tree-FS has the same number of syllables, so butterFS is an unnecessary and silly pronunciation.

rurban · on Oct 30, 2018

Nope, that is exactly the name under which it is commonly known, and that's how its developers pronounce it officially.

ThePhysicist · on Oct 30, 2018

Writing EBPF programs is still quite involved, especially if you want to distribute them on other systems (as currently the features depend on the Kernel version so the programs are not very portable yet).

Tools like bcc make the process easier though, but require additional tooling on the system (e.g. LLVM).

chronid · on Oct 30, 2018

Maybe I'm wrong (or things changed since I looked at this), but if you use bcc you essentially depend from LLVM at runtime.

I guess you would probably want the (precompiled) eBPF program as a binary that you can load in some way instead, for "production".

xmichael999 · on Oct 30, 2018

So from your perspective XDP is better the DPDK and will be the leading userspace network stack in years to come, is that correct? Any chance you can tell me why?

ilovecaching · on Oct 30, 2018

From a kernel developers perspective :) Check out the talk: Fast Programmable Networks & Encapsulated Protocols by David S. Miller.

xmichael999 · on Oct 31, 2018

Thank you very much! DPDK is not linux! :)

gonzo · on Oct 30, 2018

> Literally anyone can write an eBPF program,

Until you’ve written a couple, you might believe this. After, you’ll understand the issues with debugging, etc.

jshevek · on Oct 30, 2018

I took that sentence idiomatically, to mean something like "the obstacles have been reduced by at least an order of magnitude", since it's never literally true that literally anyone can write some software.