Extending the kernel with eBPF

Android includes an eBPF loader and library that loads eBPF programs at boot time to extend kernel functionality. This can be used for collecting statistics from the kernel, monitoring, or debugging.

About eBPF

Extended Berkeley Packet Filter (eBPF) is an in-kernel virtual machine that runs user-supplied eBPF programs that can be hooked to probes or events in the kernel, collect useful statistics, and store the results in rich data structures. A program is loaded into the kernel using the bpf(2) syscall and is provided by the user as a binary blob of eBPF machine instructions. The Android build system has support for compiling C programs to eBPF using simple build file syntax described later.

More information about eBPF internals and architecture can be found at Brendan Gregg's eBPF page.

Android BPF loader

During Android boot, all eBPF programs located at /system/etc/bpf/ are loaded. These programs are binary objects built by the Android build system from C programs accompanied with Android.bp files in the Android source tree. The build system stores the generated objects at /system/etc/bpf, and they become part of the system image.

Format of an Android eBPF C program

An eBPF C program loaded on an Android device must have the following format:

#include <bpf_helpers.h>

/* Define one or more maps in the maps section, for example
 * define a map of type array int -> uint32_t, with 10 entries
 */
DEFINE_BPF_MAP(name_of_my_map, ARRAY, int, uint32_t, 10);

/* this will also define type-safe accessors:
 *   value * bpf_name_of_my_map_lookup_elem(&key);
 *   int bpf_name_of_my_map_update_elem(&key, &value, flags);
 *   int bpf_name_of_my_map_delete_elem(&key);
 * as such it is heavily suggested to use lowercase *_map names.
 * Also note that due to compiler deficiencies you cannot use a type
 * of 'struct foo' but must instead use just 'foo'.  As such structs
 * must not be defined as 'struct foo {}' and must instead be
 * 'typedef struct {} foo'.
 */

SEC("PROGTYPE/PROGNAME")
int PROGFUNC(..args..) {
   <body-of-code
    ... read or write to MY_MAPNAME
    ... do other things
   >
}

char _license[] SEC("license") = "GPL"; // or other license

Here, name_of_my_map is the name of your map variable. It tells the BPF loader what kind of map to create with what parameters. This struct definition is provided by the bpf_helpers.h header that the C program includes. The above code results in a creation of an array map of 10 entries.

Next, the program defines a function PROGFUNC. When compiled, this function is placed in a section. The section must have a name of the format PROGTYPE/PROGNAME. PROGTYPE can be any one of the following. More types can be found in the Loader source code.

kprobe Hooks PROGFUNC onto at a kernel instruction using the kprobe infrastructure. PROGNAME must be the name of the kernel function being kprobed. Refer to the kprobe kernel documentation for more information about kprobes.
tracepoint Hooks PROGFUNC onto a tracepoint. PROGNAME must be of the format SUBSYSTEM/EVENT. For example, a tracepoint section for attaching functions to scheduler context switch events would be SEC("tracepoint/sched/sched_switch"), where sched is the name of the trace subsystem, and sched_switch is the name of the trace event. Check the trace events kernel documentation for more information about tracepoints.
skfilter Program will function as a networking socket filter.
schedcls Program functions as a networking traffic classifier.
cgroupskb, cgroupsock Program runs whenever processes in a CGroup create an AF_INET or AF_INET6 socket.

As an example of a complete C program, the following program creates a map and defines a function tp_sched_switch, which can be attached to the sched:sched_switch trace event (see this section for how to attach). The program adds information about the latest task PID that ran on a particular CPU. Name this myschedtp.c. We'll refer to this file later in this document.

#include <linux/bpf.h>
#include <stdbool.h>
#include <stdint.h>
#include <bpf_helpers.h>

DEFINE_BPF_MAP(cpu_pid_map, ARRAY, int, uint32_t, 1024);

struct switch_args {
    unsigned long long ignore;
    char prev_comm[16];
    int prev_pid;
    int prev_prio;
    long long prev_state;
    char next_comm[16];
    int next_pid;
    int next_prio;
};

SEC("tracepoint/sched/sched_switch")
int tp_sched_switch(struct switch_args* args) {
    int key;
    uint32_t val;

    key = bpf_get_smp_processor_id();
    val = args->next_pid;

    bpf_cpu_pid_map_update_elem(&key, &val, BPF_ANY);
    return 0;
}

char _license[] SEC("license") = "GPL";

The license section is used by the kernel to verify if the program is compatible with the kernel's license when the program makes use of BPF helper functions provided by the kernel. Set _license to your project's license.

Format of the Android.bp file

In order for the Android build system to build an eBPF .c program, an entry has to be made in the Android.bp file of the project.

For example, to build an eBPF C program of name bpf_test.c, make the following entry in your project's Android.bp file:

bpf {
    name: "bpf_test.o",
    srcs: ["bpf_test.c"],
    cflags: [
        "-Wall",
        "-Werror",
    ],
}

This compiles the C program resulting in the object /system/etc/bpf/bpf_test.o. On boot, the Android system automatically loads the bpf_test.o program into the kernel.

Files available in sysfs

During boot up, the Android system automatically loads all the eBPF objects from /system/etc/bpf/, creates the maps that the program needs, and pins the loaded program with its maps to the bpf file system. These files can then be used for further interaction with the eBPF program or reading maps. This section describes the conventions used for naming these files and their locations in sysfs.

The following files are created and pinned:

  • For any programs loaded, assuming PROGNAME is the name of the program and FILENAME is the name of the eBPF C file, the Android loader creates and pins each program at /sys/fs/bpf/prog_FILENAME_PROGTYPE_PROGNAME.

    For example, for the above sched_switch tracepoint example in myschedtp.c, a program file will be created and pinned to /sys/fs/bpf/prog_myschedtp_tracepoint_sched_sched_switch.

  • For any maps created, assuming MAPNAME is the name of the map and PROGNAME is the name of the eBPF C file, the Android loader creates and pins each map to /sys/fs/bpf/map_FILENAME_MAPNAME.

    For example, for the above sched_switch tracepoint example in myschedtp.c, a map file is created and pinned to /sys/fs/bpf/map_myschedtp_cpu_pid_map.

  • The bpf_obj_get() in the Android BPF library can be used to obtained a file descriptor from these pinned /sys/fs/bpf file. This function returns a file descriptor, which can be used for further operations, such as reading maps or attaching a program to a tracepoint.

Android BPF library

The Android BPF library is named libbpf_android.so and is part of the system image. This library provides the user with low-level eBPF functionality needed for creating and reading maps, creating probes, tracepoints, perf buffers etc.

Attaching programs to tracepoints and kprobes

Once tracepoint and kprobe programs have been loaded (which is done automatically at boot up as previously described), they need to be activated. To activate them, first, use the bpf_obj_get() API to obtain the program fd from the pinned file's location (see the Files available in sysfs section). Next, call the bpf_attach_tracepoint() API in the BPF library, passing it the program fd and the tracepoint name.

For example, to attach the sched_switch tracepoint defined in the myschedtp.c source file in the example above, do the following (error checking is not shown):

  char *tp_prog_path = "/sys/fs/bpf/prog_myschedtp_tracepoint_sched_sched_switch";
  char *tp_map_path = "/sys/fs/bpf/map_myschedtp_cpu_pid";

  // Attach tracepoint and wait for 4 seconds
  int mProgFd = bpf_obj_get(tp_prog_path);
  int mMapFd = bpf_obj_get(tp_map_path);
  int ret = bpf_attach_tracepoint(mProgFd, "sched", "sched_switch");
  sleep(4);

  // Read the map to find the last PID that ran on CPU 0
  android::bpf::BpfMap myMap(mMapFd);
  printf("last PID running on CPU %d is %d\n", 0, myMap.readValue(0));

Reading from the maps

BPF maps support arbitrary complex key and value structures or types. The Android BPF library includes an android::BpfMap class that makes use of C++ templates to instantiate BpfMap based on the key and value's type for the map in question. The above code shows an example of using a BpfMap with key and value as integers. The integers can also be arbitrary structures.

Thus the templatized BpfMap class makes it easy to define a custom BpfMap object suitable for the particular map. The map can then be accessed using the custom-generated functions which are type aware, resulting in cleaner code.

For more information about BpfMap, refer to the [Android sources].(https://android.googlesource.com/platform/system/bpf/+/75b410bdf186263fa4e05e079bfba44578622c33/libbpf/include/bpf/BpfMap.h)

Debugging issues

During boot time, several messages related to BPF loading are logged. If the loading process fails for any reason, a detailed log message is provided in logcat. Filtering the logcat logs by "bpf" prints all the messages and any detailed errors during load time, such as eBPF verifier errors.

Users of eBPF in Android

Currently there are two eBPF C programs in Android that you can refer to for examples.

The netd eBPF C program is used by the networking daemon (netd) in Android for various purposes such as socket filtering and statistics gathering. To see how this programs is used, check the eBPF traffic monitor sources.

The time_in_state eBPF C program calculates the amount of time an Android app spends at different CPU frequencies which is used to calculate power. This program is currently under development.

Licensing considerations

If you want to contribute an eBPF C program, it should be contributed to the right project depending on its license. A GPL licensed eBPF C program should be contributed to the system/bpfprogs AOSP project. On the other hand, if the program is Apache licensed, it should be contributed to system/bpf AOSP project.