Extending the Kernel with eBPF

Extended Berkeley Packet Filter (eBPF) is an in-kernel virtual machine that runs user-supplied eBPF programs to extend kernel functionality. These programs can be hooked to probes or events in the kernel and used to collect useful kernel statistics, monitor, and debug. A program is loaded into the kernel using the bpf(2) syscall and is provided by the user as a binary blob of eBPF machine instructions. The Android build system has support for compiling C programs to eBPF using simple build file syntax described in this document.

More information about eBPF internals and architecture can be found at Brendan Gregg's eBPF page.

Android includes an eBPF loader and library that loads eBPF programs at boot time.

Android BPF loader

During Android boot, all eBPF programs located at /system/etc/bpf/ are loaded. These programs are binary objects built by the Android build system from C programs and are accompanied by Android.bp files in the Android source tree. The build system stores the generated objects at /system/etc/bpf, and those objects become part of the system image.

Format of an Android eBPF C program

An eBPF C program must have the following format:

#include <bpf_helpers.h>

/* Define one or more maps in the maps section, for example
 * define a map of type array int -> uint32_t, with 10 entries
 */
DEFINE_BPF_MAP(name_of_my_map, ARRAY, int, uint32_t, 10);

/* this will also define type-safe accessors:
 *   value * bpf_name_of_my_map_lookup_elem(&key);
 *   int bpf_name_of_my_map_update_elem(&key, &value, flags);
 *   int bpf_name_of_my_map_delete_elem(&key);
 * as such it is heavily suggested to use lowercase *_map names.
 * Also note that due to compiler deficiencies you cannot use a type
 * of 'struct foo' but must instead use just 'foo'.  As such structs
 * must not be defined as 'struct foo {}' and must instead be
 * 'typedef struct {} foo'.
 */

DEFINE_BPF_PROG("PROGTYPE/PROGNAME", AID_*, AID_*, PROGFUNC)(..args..) {
   <body-of-code
    ... read or write to MY_MAPNAME
    ... do other things
   >
}

LICENSE("GPL"); // or other license

Where:

  • name_of_my_map is the name of your map variable. This name informs the BPF loader of the type of map to create and with what parameters. This struct definition is provided by the included bpf_helpers.h header.
  • PROGTYPE/PROGNAME represents the type of the program and program name. The type of the program can be any of those listed in the following table. When a type of program isn't listed, there is no strict naming convention for the program; the name just needs to be known to the process that attaches the program.

  • PROGFUNC is a function that, when compiled, is placed in a section of the resulting file.

kprobe Hooks PROGFUNC onto at a kernel instruction using the kprobe infrastructure. PROGNAME must be the name of the kernel function being kprobed. Refer to the kprobe kernel documentation for more information about kprobes.
tracepoint Hooks PROGFUNC onto a tracepoint. PROGNAME must be of the format SUBSYSTEM/EVENT. For example, a tracepoint section for attaching functions to scheduler context switch events would be SEC("tracepoint/sched/sched_switch"), where sched is the name of the trace subsystem, and sched_switch is the name of the trace event. Check the trace events kernel documentationfor more information about tracepoints.
skfilter Program functions as a networking socket filter.
schedcls Program functions as a networking traffic classifier.
cgroupskb, cgroupsock Program runs whenever processes in a CGroup create an AF_INET or AF_INET6 socket.

Additional types can be found in the Loader source code.

For example, the following myschedtp.c program adds information about the latest task PID that has run on a particular CPU. This program achieves its goal by creating a map and defining a tp_sched_switch function which can be attached to the sched:sched_switch trace event. For more information, see Attaching programs to tracepoints.

#include <linux/bpf.h>
#include <stdbool.h>
#include <stdint.h>
#include <bpf_helpers.h>

DEFINE_BPF_MAP(cpu_pid_map, ARRAY, int, uint32_t, 1024);

struct switch_args {
    unsigned long long ignore;
    char prev_comm[16];
    int prev_pid;
    int prev_prio;
    long long prev_state;
    char next_comm[16];
    int next_pid;
    int next_prio;
};

DEFINE_BPF_PROG("tracepoint/sched/sched_switch", AID_ROOT, AID_SYSTEM, tp_sched_switch)
(struct switch_args *args) {
    int key;
    uint32_t val;

    key = bpf_get_smp_processor_id();
    val = args->next_pid;

    bpf_cpu_pid_map_update_elem(&key, &val, BPF_ANY);
    return 1; // return 1 to avoid blocking simpleperf from receiving events
}

LICENSE("GPL");

The LICENSE macro is used to verify if the program is compatible with the kernel's license when the program makes use of BPF helper functions provided by the kernel. Specify the name of your program's license in string form, such as LICENSE("GPL") or LICENSE("Apache 2.0").

Format of the Android.bp file

For the Android build system to build an eBPF .c program, you must create an entry in the Android.bp file of the project. For example, to build an eBPF C program named bpf_test.c, make the following entry in your project's Android.bp file:

bpf {
    name: "bpf_test.o",
    srcs: ["bpf_test.c"],
    cflags: [
        "-Wall",
        "-Werror",
    ],
}

This entry compiles the C program resulting in the object /system/etc/bpf/bpf_test.o. On boot, the Android system automatically loads the bpf_test.o program into the kernel.

Files available in sysfs

During boot, the Android system automatically loads all the eBPF objects from /system/etc/bpf/, creates the maps that the program needs, and pins the loaded program with its maps to the BPF file system. These files can then be used for further interaction with the eBPF program or reading maps. This section describes the conventions used for naming these files and their locations in sysfs.

The following files are created and pinned:

  • For any programs loaded, assuming PROGNAME is the name of the program and FILENAME is the name of the eBPF C file, the Android loader creates and pins each program at /sys/fs/bpf/prog_FILENAME_PROGTYPE_PROGNAME.

    For example, for the previous sched_switch tracepoint example in myschedtp.c, a program file is created and pinned to /sys/fs/bpf/prog_myschedtp_tracepoint_sched_sched_switch.

  • For any maps created, assuming MAPNAME is the name of the map and FILENAME is the name of the eBPF C file, the Android loader creates and pins each map to /sys/fs/bpf/map_FILENAME_MAPNAME.

    For example, for the previous sched_switch tracepoint example in myschedtp.c, a map file is created and pinned to /sys/fs/bpf/map_myschedtp_cpu_pid_map.

  • bpf_obj_get() in the Android BPF library returns a file descriptor from the pinned /sys/fs/bpf file. This file descriptor can be used for further operations, such as reading maps or attaching a program to a tracepoint.

Android BPF library

The Android BPF library is named libbpf_android.so and is part of the system image. This library provides the user with low-level eBPF functionality needed for creating and reading maps, creating probes, tracepoints, and perf buffers.

Attaching programs to tracepoints

Tracepoint programs are loaded automatically at boot. After loading, the tracepoint program must be activated using these steps:

  1. Call bpf_obj_get() to obtain the program fd from the pinned file's location. For more information, refer to the Files available in sysfs.
  2. Call bpf_attach_tracepoint() in the BPF library, passing it the program fd and the tracepoint name.

The following code sample shows how to to attach the sched_switch tracepoint defined in the previous myschedtp.c source file (error checking isn't shown):

  char *tp_prog_path = "/sys/fs/bpf/prog_myschedtp_tracepoint_sched_sched_switch";
  char *tp_map_path = "/sys/fs/bpf/map_myschedtp_cpu_pid";

  // Attach tracepoint and wait for 4 seconds
  int mProgFd = bpf_obj_get(tp_prog_path);
  int mMapFd = bpf_obj_get(tp_map_path);
  int ret = bpf_attach_tracepoint(mProgFd, "sched", "sched_switch");
  sleep(4);

  // Read the map to find the last PID that ran on CPU 0
  android::bpf::BpfMap<int, int> myMap(mMapFd);
  printf("last PID running on CPU %d is %d\n", 0, myMap.readValue(0));

Reading from the maps

BPF maps support arbitrary complex key and value structures or types. The Android BPF library includes an android::BpfMap class that makes use of C++ templates to instantiate BpfMap based on the key and value's type for the map in question. The previous code sample demonstrates using a BpfMap with key and value as integers. The integers can also be arbitrary structures.

Thus the templatized BpfMap class makes it easy to define a custom BpfMap object suitable for the particular map. The map can then be accessed using the custom-generated functions, which are type aware, resulting in cleaner code.

For more information about BpfMap, refer to the Android sources.

Debugging issues

During boot time, several messages related to BPF loading are logged. If the loading process fails for any reason, a detailed log message is provided in logcat. Filtering the logcat logs by "bpf" prints all the messages and any detailed errors during load time, such as eBPF verifier errors.

Examples of eBPF in Android

The following programs in AOSP provide additional examples of using eBPF:

  • The netd eBPF C program is used by the networking daemon (netd) in Android for various purposes such as socket filtering and statistics gathering. To see how this program is used, check the eBPF traffic monitor sources.

  • The time_in_state eBPF C program calculates the amount of time an Android app spends at different CPU frequencies, which is used to calculate power.

  • In Android 12, the gpu_mem eBPF C program tracks total GPU memory usage for each process and for the entire system. This program is used for GPU memory profiling.