eBPF is a powerful technology that allows you to observe, secure, and filter your networks. In today’s blog post and related YouTube video, I'll introduce eBPF and its value. Then we will look at a BPF map and build an eBPF program before we look at a few solutions that use this technology. As usual, we will finish with a tutorial so you can learn more about eBPF hands-on.
What is eBPF?
eBPF stands for Berkley Packet Filter and is the new version of the BPF technology.
BPF was initially created in 1992 to analyze the network traffic and provide filtering capabilities. At that time, network traffic was much simpler than today, but many engineers were already thinking about analyzing and filtering traffic.
Let’s say we want to analyze and report all the TCP communication managed by our server.
To do this, we can modify all our servers' code to log everything from our incoming or outgoing connection. This approach could work but will require a lot of effort, And we would need to provide this code to all the organization’s servers. The other solution is to fork the kernel version of our server and update the kernel by adding instrumentation instructions in our kernel. This approach is more complex. Building kernel code is very complex, and we would need to run lots of tests to measure the impact of this code fully. Moreover, how will I manage all the security patches in the future?
The kernel code provides many features but manages all the instructions to interact with our hardware. When we build our code, we never define how to interact with a hardware device like a disk. Our code uses a predefined syscall to interact with the hardware: network card, disk, etc. When we deploy our code on our system, it will sit in the user space. Our code will interact with the kernel, which will be a bridge to our hardware.
For example, in my car, I have a screen to adjust the air conditioning. A certain software manages this screen that interacts with the hardware components in the car. But the coder didn’t instruct the software to send signals directly to the air conditioner. Instead, it sends a syscall to the car’s kernel that will send the right signal to the air conditioner.
BPF was created to be able to inject simple programs in your kernel based on syscall events.
Once the BPF code is deployed, it will register to a network packet kernel event. For example, we can program it to look at incoming and outgoing packets. Every time a packet comes into the server, the BPF program is launched and analyzes the TCP packet.
Several networking tools have been built with the help of BPF, like tcpdump.
BPF was also used for filtering. Let’s say we want to accept packets with a signed hash in the packet's header. Similar to the analysis scenario, the BPF program will be launched and look at the header; if the expected value or field is missing, it drops the packet.
As you can understand BPF was mainly focused on networking use cases.
Since kernel 3.18, an extended version of BPF has been released, providing more features:
It won't only focus on packet filtering events but almost all kernel events
It provides support for 64bits to collect and store more information
A new data structure to help us to store data and exchange it with our user space programs
What’s the value of eBPF?
With the extended version of BPF, you can improve the current features of the kernel by adding a process that collects KPIs when your kernel is managing a socket connection or other processes.
The eBPF program can register to any kernel event, and once the kernel event happens, the eBPF program is triggered.
What type of kernel events exist?
There are multiple types of events related to almost all the syscalls. Events can be related to the network (open a socket, close a socket, disk, write, read) or the creation of a process, a thread, a CPU instruction, etc.
With eBPF we can now build programs for various use cases:
Networking to analyze, route, and more
Security to filter traffic based on specific rules and report any valid or blocked traffic
Tracing/Profiling to collect detailed execution flows from the user space programs to kernel instructions
Observability to report disk activity, process, CPU instruction, etc. It is more efficient because we're not pooling for information; our program is launched exactly when we need to measure something
Well, eBPF would do the same but with your kernel.
So let’s say that we want to write a program that collects the number of bytes, the name of the file, and the folder when we write into a file.
First, you need to identify the right syscall event. In this case, it would be vfs_write. Afterward, you can start writing the eBPF code that will register to your vfs_write event.
Once the code is deployed, every time a vfs_write syscall is triggered, our eBPF program will be executed, and we will collect the right data.
The big question is, how can we retrieve the data in the traditional user space?
For this, eBPF comes with the BPF map that can help you store data into a hashmap, a stack trace, and more.
Introduction to BPF maps
The BPF map is one of the major improvements provided by eBPF. The BPF map is storage located in the kernel. eBPF provides a syscall to retrieve the data stored in the BPF map.
eBPF provides various BPF maps, which you can check out in the Linux Kernel docs. All of them have been designed for specific use cases. Here are some examples:
BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_PERCPU_HASH are hash map storage, so key-value pair storage but BPF_MAP_TYPE_PERCPU_HASH provides a separate value slot per CPU
BPF_MAP_TYPE_ARRAY will provide storage similar to an Array. Where we would define the maximum size of the array
BPF_MAP_TYPE_PERCPU_ARRAY provides storage per CPU histograms of latency
BPF_MAP_TYPE_STACK_TRACE can be used to store stack traces of your programs
Therefore when building your eBPF program, you'll need to figure out what type of BPF map you need to use (HASHMAP and ARRAY are heavily used).
To be able to use a BPF map you need first to declare it by defining:
The BPF map type
The max number of elements
The key size in bytes
The value size in bytes
The kernel comes with predefined functions allowing us from the user space to create a BPF map, look up a given key, create or update a key value in the map, find and delete by key, and delete a map
How to create an eBPF program
Writing eBPF sounds very complicated, and it is. However, a way to simplify this process is by using the BCC framework. BCC stands for BPF Compiler Collection and helps you build a Python program. It was mainly built for profiling and tracing.
BCC provides a set of tools built by the community and initially used by Brendan Gregg.
There is a list of predefined Python that will help us to trace various components of the kernel as explained in this diagram.
If you want to learn about those tools, I recommend watching this video by Brendan Greggs on Linux Performance Tools.
You define a Python/Go program that will declare a string containing our eBPF program in C language, explain what to do with our metrics collected in a BPF map, and register our eBPF program to a specific kernel event.
When you deploy your BCC program, it will go through several steps to be validated:
First, it uses BCC to build the eBPF bytecode program
The bytecode program goes through the eBPF verifier, ensuring that the program is safe and performant
Then it goes into the JIT compiler that transforms the eBPF code into bytecodes
Otherwise, BPFtrace provides high-level tracing language from our kernel (vers 4.x+).
So, you build a BPFtrace program that will be compiled to generate the program in byte code then it will follow a similar journey, first the eBPF verifier and then the Jit compiler
There is also an eBPF Go library that follows a similar approach described by the BCC Python library.
The eBPF program should be lightweight and reliable, and it needs to use a set of helper functions built for BPF to access various kernel elements. The set of available helper calls is constantly evolving.
Finally, an eBPF program needs to have a type. The most common types are kprobe (collect kernel level details), uprobe (collect details from the user space), and tracepoints.
Kepler eBPF tutorialIn today's tutorial, we are going to use Kepler to illustrate an ebpf solution. Kepler is deployed as daemonset.
The main pod of Kepler injects ebpf programs to collet performance counters and tracepoints.
With the help of the data collected and a data model, it calculates the power consumption.
It also requires a Cluster Role, to let Kepler convert container id information to Kubernetes pods and also collect group stats.
In this tutorial, we will deploy the Kepler and its default Grafana dashboard, but we will ingest the Kepler metrics in Dyanatrace and build a Dynatrace dashboard
For this tutorial we would need:
A Kubernetes cluster
A dynatrace Tenant
Nginx ingress controller
The OpenTelemetry operator
The Prometheus Operator without any default exporters : node-exporter and kubestatemetrics
The OpenTelemetry Operator
The OpenTelemetry demo application
Watch the full tutorial on my YouTube Channel: What is eBPF?
Or follow the instructions directly on GitHub: isItObservable/ebpf-kepler (github.com)
Let's watch the whole episode on our YouTube channel.