Android Kernel ABI Monitoring

You can use application Binary Interface (ABI) Monitoring tooling, available in Android 11 and higher, to stabilize the in-kernel ABI of Android kernels. The tooling collects and compares ABI representations from existing kernel binaries (vmlinux+ modules). These ABI representations are the .xml files and the symbol lists. The interface on which the representation gives a view is called the Kernel Module Interfaces (KMIs). You can use the tooling to track and mitigate changes to the KMI.

The ABI monitoring tooling is developed in AOSP and uses libabigail to generate and compare representations.

This page describes the tooling, the process of collecting and analyzing ABI representations, and the usage of such representations to provide stability to the in-kernel ABI. This page also provides information for contributing changes to the Android kernels.

This directory contains the specific tools for the ABI analysis. Use it with the build scripts provided by build_abi.sh.)

Process

Analyzing the kernel's ABI takes multiple steps, most of which can be automated:

  1. Acquire the toolchain, build scripts, and kernel sources through repo.
  2. Provide any prerequisites (such as the libabigail library and collection of tools).
  3. Build the kernel and its ABI representation.
  4. Analyze ABI differences between the build and a reference.
  5. Update the ABI representation (if required).
  6. Work with symbol lists.

The following instructions work for any kernel that you can build using a supported toolchain (such as the prebuilt Clang toolchain). repo manifests are available for all Android common kernel branches and for several device-specific kernels, they ensure that the correct toolchain is used when you build a kernel distribution for analysis.

Using the ABI Monitoring tooling

1. Acquire the toolchain, build scripts, and kernel sources through repo

You can acquire the toolchain, build scripts (these scripts), and kernel sources with repo. For detailed documentation, refer to the corresponding information for building Android kernels.

To illustrate the process, the following steps use common-android-mainline, an Android kernel branch that's kept current with the upstream Linux releases. To obtain this branch through repo, execute:

repo init -u https://android.googlesource.com/kernel/manifest -b common-android-mainline
repo sync

2. Provide prerequisites

The ABI tooling uses libabigail, a library and collection of tools, to analyze binaries. A suitable set of prebuilt binaries comes with the kernel-build-tools and is automatically used with build_abi.sh.

To utilize the lower-level tooling (such as dump_abi), add the kernel-build- tools to the PATH.

3. Build the kernel and its ABI representation

At this point you're ready to build a kernel with the correct toolchain and to extract an ABI representation from its binaries (vmlinux + modules).

Similar to the usual Android kernel build process (using build.sh), this step requires running build_abi.sh.

BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh

That builds the kernel and extracts the ABI representation into the out_abi subdirectory. In this case out/android-mainline/dist/abi.xml would be a symbolic link to out_abi/android-mainline/dist/abi-<id>.xml. <id> is computed by executing git describe against the kernel source tree.

4. Analyze the ABI differences between the build and a reference representation

build_abi.sh analyzes and reports any ABI differences when a reference is provided through the environment variable ABI_DEFINITION. ABI_DEFINITION must point to a reference file relative to the kernel source tree, and can be specified on the command line or, more commonly, as a value in build.config. The following provides an example:

BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh

In the command above, build.config.gki.aarch64 defines the reference file (as ABI_DEFINITION=android/abi_gki_aarch64.xml), and diff_abi calls abidiff to compare the freshly generated ABI representation against the reference file. build_abi.sh prints the location of the report and emits a short report for any ABI breakage. If breakages are detected, build_abi.sh terminates and returns a nonzero exit code.

5. Update the ABI representation (if required)

To update the ABI representation, invoke build_abi.sh with the --update flag. It updates the corresponding abi.xml file that's defined by build.config. To print the ABI differences due to the update, invoke the script with --print-report. Be sure to include the report in the commit message when updating the abi.xml file.

6. Working with symbol lists

Parameterize build_abi.sh with KMI symbol lists to filter symbols during ABI extraction. These are plain text files that list relevant ABI kernel symbols. For example, a symbol list file with the following content limits ABI analysis to the ELF symbols with the names symbol1 and symbol2:

[abi_symbol_list]
   symbol1
   symbol2

Changes to other ELF symbols aren't considered. A symbol list file can be specified in the corresponding build.config configuration file with KMI_SYMBOL_LIST= as a file relative to the kernel source directory ($KERNEL_DIR). To provide a level of organization, you can specify additional symbol list files by using ADDITIONAL_KMI_SYMBOL_LISTS= in the build.config file. This specifies further symbol list files, relative to $KERNEL_DIR; separate multiple filenames by whitespace.

To create an initial symbol list or to update an existing one, you must use the build_abi.sh script with the --update-symbol-list parameter.

When the script is run with an appropriate configuration, it builds the kernel and extracts the symbols that are exported from vmlinux and GKI modules and that are required by any other module in the tree.

Consider vmlinux exporting the following symbols (usually done via the EXPORT_SYMBOL* macros):

  func1
  func2
  func3

Also, imagine there were two vendor modules, modA.ko and modB.ko, which require the following symbols (in other words, they list undefined symbol entries in their symbol table):

 modA.ko:    func1 func2
 modB.ko:    func2

From an ABI stability point of view, func1 and func2 must be kept stable, as they're used by an external module. On the contrary, while func3 is exported, it isn't actively used (in other words, it's not required) by any module. Thus, the symbol list contains func1 and func2 only.

To create or update an existing symbol list, build_abi.sh must be run as follows:

BUILD_CONFIG=path/to/build.config.device build/build_abi.sh --update-symbol-list

In this example, build.config.device must include several configuration options:

  • vmlinux must be in the FILES list.
  • KMI_SYMBOL_LIST must be set and pointing at the KMI symbol list to update.
  • GKI_MODULES_LIST must be set and pointing at the list of GKI modules. This path is usually android/gki_aarch64_modules.

Working with the lower-level ABI tooling

Most users will only need to use build_abi.sh. In some cases, working directly with the lower-level ABI tooling might be necessary. The two commands used by build_abi.sh, dump_abi and diff_abi, are available to extract and compare ABI files. See the following sections for their usages.

Creating ABI representations from kernel trees

Provided a linux kernel tree with built vmlinux and kernel modules, the tool dump_abi creates an ABI representation using the selected ABI tool. A sample invocation looks like this:

dump_abi --linux-tree path/to/out --out-file /path/to/abi.xml

The file abi.xml contains a textual ABI representation of the combined, observable ABI of vmlinux and the kernel modules in the given directory. This file might be used for manual inspection, further analysis, or as a reference file to enforce ABI stability.

Comparing ABI representations

ABI representations created by dump_abi can be compared with diff_abi. Use the same abi-tool for both dump_abi and diff_abi. A sample invocation looks like this:

diff_abi --baseline abi1.xml --new abi2.xml --report report.out

The generated report lists detected ABI changes that affect the KMI. The files specified as baseline and new are ABI representations that were collected with dump_abi. diff_abi propagates the exit code of the underlying tool and therefore returns a non-zero value when the ABIs compared are incompatible.

Using KMI symbol lists

To filter representations created with dump_abi or to filter symbols compared with diff_abi, use the parameter --kmi-symbol-list, that takes a path to a KMI symbol list file:

dump_abi --linux-tree path/to/out --out-file /path/to/abi.xml --kmi-symbol-list /path/to/symbol_list_file

Comparing kernel binaries against the GKI reference KMI

While you're working on the GKI Kernel compliance, it's useful to regularly compare a local kernel build to a reference GKI KMI representation without having to use build_abi.sh. The tool gki_check is a lightweight tool to do exactly that. Given a local Linux Kernel build tree, a sample invocation to compare the local binaries' representation to, for example, this is how to compare against the 5.4 representation:

build/abi/gki_check --linux-tree path/to/out/ --kernel-version 5.4

gki_check uses parameter names consistent with dump_abi and diff_abi. Hence, --kmi-symbol-list path/to/kmi_symbol_list_file can be used to limit that comparison to allowed symbols by passing a KMI symbol list.

Dealing with ABI breakages

As an example, the following patch introduces a very obvious ABI breakage:

 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
  index 5ed8f6292a53..f2ecb34c7645 100644
  --- a/include/linux/mm_types.h
  +++ b/include/linux/mm_types.h
  @@ -339,6 +339,7 @@ struct core_state {
   struct kioctx_table;
   struct mm_struct {
      struct {
  +       int dummy;
          struct vm_area_struct *mmap;            /* list of VMAs */
          struct rb_root mm_rb;
          u64 vmacache_seqnum;                   /* per-thread vmacache */

When you run build_abi.sh again with this patch applied, the tooling exits with a non-zero error code and reports an ABI difference similar to this:

 Leaf changes summary: 1 artifact changed
  Changed leaf types summary: 1 leaf type changed
  Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
  Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

  'struct mm_struct at mm_types.h:372:1' changed:
    type size changed from 6848 to 6912 (in bits)
    there are data member changes:
  [...]

Fixing a broken ABI on Android Gerrit

If you didn't intentionally break the kernel ABI, then you need to investigate, using the guidance provided by the ABI monitoring tooling. The most common causes of breakages are added or deleted functions, changed data structures, or changes to the ABI caused by adding config options that lead to any of the aforementioned. Begin by addressing the issues found by the tool.

You can reproduce the ABI test locally by running the following command with the same arguments that you would have used for running build/build.sh:

This is an example command for the GKI kernels:

BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh

Updating the Kernel ABI

If you need to update the kernel ABI representation, then you must update the corresponding abi.xml file in the kernel source tree. The most convenient way to do this is by using build/build_abi.sh like so:

build/build_abi.sh --update --print-report

Use the same arguments that you would have used to run build/build.sh. This updates the correct abi.xml in the source tree and prints the detected differences. As a matter of practice, include the printed (short) report in the commit message (at least partially).

Android Kernel Branches with predefined ABI

Some kernel branches come with predefined ABI representations for Android as part of their source distribution. These ABI representations are intended to be accurate, and to reflect the result of build_abi.sh as if you would execute it on your own. As the ABI is heavily influenced by various kernel configuration options, these .xml files usually belong to a certain configuration. For example, the common-android-mainline branch contains an abi_gki_aarch64.xml that corresponds to the build result when using the build.config.gki.aarch64. In particular, build.config.gki.aarch64 also refers to this file through ABI_DEFINITION.

Such predefined ABI representations are used as a baseline definition when comparing with diff_abi. For example, to validate a kernel patch regarding any changes to the ABI, create the ABI representation with the patch applied and use diff_abi to compare it to the expected ABI for that particular source tree or configuration. If ABI_DEFINITION is set, running build_abi.sh accordingly will do.

Enforcing the KMI using module versioning

The GKI kernels use module versioning (CONFIG_MODVERSIONS) to enforce KMI compliance at runtime. Module versioning will cause CRC mismatch failures at module load-time if the expected KMI of a module doesn't match the vmlinux KMI. For example, here is a typical failure that occurs at module load-time due to a CRC mismatch for the symbol module_layout():

init: Loading module /lib/modules/kernel/.../XXX.ko with args ""
XXX: disagrees about version of symbol module_layout
init: Failed to insmod '/lib/modules/kernel/.../XXX.ko' with args ''

Module versioning uses

Module versioning is useful for many reasons:

  1. It catches changes in data structure visibility. If modules can change opaque data structures, such as data structures that aren't part of the KMI, modules will break after future changes to the structure.
  2. It adds a run-time check to avoid accidentally loading a module that isn't KMI-compatible with the kernel. (Such as when a current module is loaded at a later date by a new kernel that’s incompatible.) This is preferable to having hard-to-debug subsequent runtime issues or kernel crashes.
  3. abidiff has limitations in identifying ABI differences in certain convoluted cases that CONFIG_MODVERSIONS can catch.

As an example for (1), consider the fwnode field in struct device. That field MUST be opaque to modules so that they cannot make changes to fields of device.->fw_node or make assumptions about its size.

However, if a module includes <linux/fwnode.h> (directly or indirectly), then the fwnode field in the struct device stops being opaque to it. The module can then make changes to device->fwnode->dev or device->fwnode->ops. That's problematic for several reasons:

  1. It can break assumptions the core kernel code is making about its internal data structures.

  2. If a future kernel update changes the struct fwnode_handle (the data type of fwnode), then the module won't work with the new kernel. Moreover, abidiff won't show any differences because the module is breaking the KMI by directly manipulating internal data structures in ways that can't be captured by only inspecting the binary representation.

Enabling module versioning prevents all these issues.

Checking for CRC mismatches without booting the device

Any kernel build with CONFIG_MODVERSIONS enabled does generate a Module.symvers file as part of the build process. The file has one line for every symbol exported by the vmlinux and the modules. Each line consists of the CRC value, symbol name, symbol namespace, vmlinux or module name that's exporting the symbol, and the export type (for example, EXPORT_SYMBOL vs. EXPORT_SYMBOL_GPL).

You can compare the Module.symvers files between the GKI build and your build to check for any CRC differences in the symbols exported by vmlinux. If there is a CRC value difference in any symbol exported by vmlinux AND it's used by one of the modules you load in your device, the module won't load.

If you don't have all the build artifacts, but just have the vmlinux files of the GKI kernel and your kernel, you can compare the CRC values for a specific symbol by running the following command on both the kernels, then comparing the output:

nm <path to vmlinux>/vmlinux | grep __crc_<symbol name>

For example, to check the CRC value for the module_layout symbol,

nm vmlinux | grep __crc_module_layout
0000000008663742 A __crc_module_layout

Fixing CRC mismatch

If you get a CRC mismatch when loading the module, here is how to you fix it:

  1. Build the GKI kernel and your device kernel, and add KBUILD_SYMTYPES=1 in front of the command you use to build the kernel. Note, when using build_abi.sh, this is implicitly set already. This will generate a .symtypes file for each .o file. For example:

    KBUILD_SYMTYPES=1 BUILD_CONFIG=common/build.config.gki.aarch64 build/build.sh
    
  2. Find the .c file in which the symbol with CRC mismatch is exported. For example:

    cd common && git grep EXPORT_SYMBOL.*module_layout
    kernel/module.c:EXPORT_SYMBOL(module_layout);
    
  3. That .c file has a corresponding .symtypes file in the GKI, and your device kernel build artifacts.

    cd out/$BRANCH/common && ls -1 kernel/module.*
    kernel/module.o
    kernel/module.o.symversions
    kernel/module.symtypes
    

    a. The format of this file is one (potentially very long) line per symbol.

    b. [s|u|e|etc]# at the start of the line means the symbol is of data type [struct|union|enum|etc]. For example, t#bool typedef _Bool bool.

    c. A missing # prefix in the start of the line indicates the symbol is a function. For example,
    find_module s#module * find_module ( const char * ).

  4. Compare those two files and fix all the differences.

Case 1: Differences due to data type visibility

If one kernel keeps a symbol or data type opaque to the modules and the other kernel doesn't, then it shows up as a difference between the .symtypes files of the two kernels. The .symtypes file from one of the kernels has UNKNOWN for a symbol and the .symtypes file from the other kernel has an expanded view of the symbol or data type.

For example, assume you add this line to include/linux/device.h in your kernel:

 #include <linux/fwnode.h>

That causes CRC mismatches, with one of them for module_layout(). If you compare the module.symtypes for that symbol, it looks like this:

 $ diff -u <GKI>/kernel/module.symtypes <your kernel>/kernel/module.symtypes
  --- <GKI>/kernel/module.symtypes
  +++ <your kernel>/kernel/module.symtypes
  @@ -334,12 +334,15 @@
  ...
  -s#fwnode_handle struct fwnode_handle { UNKNOWN }
  +s#fwnode_reference_args struct fwnode_reference_args { s#fwnode_handle * fwnode ; unsigned int nargs ; t#u64 args [ 8 ] ; }
  ...

If your kernel has it as UNKNOWN and the GKI kernel has the expanded view of the symbol (very unlikely), then merge the latest Android Common Kernel into your kernel so that you are using the latest GKI kernel base.

Almost always, the GKI kernel has it as UNKNOWN, but your kernel has the internal details of the symbol because of changes made to your kernel. This is because one of the files in your kernel added a #include that isn't present in the GKI kernel.

To identify the #include that causes the difference, follow these steps:

  1. Open the header file that defines the symbol or data type having this difference. For example, edit include/linux/fwnode.h for the struct fwnode_handle.
  2. Add the following code at the top of the header file:

    #ifdef CRC_CATCH
    #error "Included from here"
    #endif
    
  3. Then in the module's .c file (the one that has a CRC mismatch), add the following as the first line before any of the #include lines.

    #define CRC_CATCH 1
    
  4. Now compile your module. You'll get a build-time error that shows the chain of header file #include that led to this CRC mismatch. For example:

    In file included from .../drivers/clk/XXX.c:16:`
    In file included from .../include/linux/of_device.h:5:
    In file included from .../include/linux/cpu.h:17:
    In file included from .../include/linux/node.h:18:
    .../include/linux/device.h:16:2: error: "Included from here"
    #error "Included from here"
    
  5. One of the links in this chain of #include is due to a change done in your kernel, that's missing in the GKI kernel.

  6. Once you identify the change, revert it in your kernel or upload it to ACK and get it merged.

Case 2: Differences due to data type changes

If the CRC mismatch for a symbol or data type isn't due to a difference in visibility, then it's due to actual changes (additions, removals, or changes) in the data type itself. Typically, abidiff catches this, but if it misses any due to known detection gaps, the MODVERSIONS mechanism can catch them.

For example, assume you make the following change in your kernel:

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
  --- a/include/linux/iommu.h
  +++ b/include/linux/iommu.h
  @@ -259,7 +259,7 @@ struct iommu_ops {
     void (*iotlb_sync)(struct iommu_domain *domain);
     phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
     phys_addr_t (*iova_to_phys_hard)(struct iommu_domain *domain,
  -        dma_addr_t iova);
  +        dma_addr_t iova, unsigned long trans_flag);
     int (*add_device)(struct device *dev);
     void (*remove_device)(struct device *dev);
     struct iommu_group *(*device_group)(struct device *dev);

That would cause a lot of CRC mismatches (as many symbols are indirectly affected by this type of change) and one of them would be for devm_of_platform_populate().

If you compare the .symtypes files for that symbol, it might look like this:

 $ diff -u <GKI>/drivers/of/platform.symtypes <your kernel>/drivers/of/platform.symtypes
  --- <GKI>/drivers/of/platform.symtypes
  +++ <your kernel>/drivers/of/platform.symtypes
  @@ -399,7 +399,7 @@
  ...
  -s#iommu_ops struct iommu_ops { ... ; t#phy
  s_addr_t ( * iova_to_phys_hard ) ( s#iommu_domain * , t#dma_addr_t ) ; int
    ( * add_device ) ( s#device * ) ; ...
  +s#iommu_ops struct iommu_ops { ... ; t#phy
  s_addr_t ( * iova_to_phys_hard ) ( s#iommu_domain * , t#dma_addr_t , unsigned long ) ; int ( * add_device ) ( s#device * ) ; ...

To identify the changed type, follow these steps:

  1. Find the definition of the symbol in the source code (usually in .h files).
  2. If there's an obvious symbol difference between your kernel and the GKI kernel, do a git blame to find the commit.
  3. Sometimes a symbol is deleted in one tree, and you want to delete it in the other tree. To find the change that deleted the line, run this command on the tree where the line was deleted:

    a. git log -S "copy paste of deleted line/word" -- <file where it was deleted>

    b. You'll get a shortened list of commits. The first one is probably the one you are searching for. If it isn't, go through the list until you find the commit.

  4. Once you identify the change, either revert it in your kernel or upload it to ACK and get it merged.