You can use application Binary Interface (ABI) Monitoring tooling, available in
Android 11 and higher, to stabilize the in-kernel
ABI of Android kernels. The tooling collects and compares ABI representations
from existing kernel binaries (vmlinux
+ modules). These ABI representations
are the .xml
files and the symbol lists. The interface on which the
representation gives a view is called the Kernel Module Interfaces (KMIs). You
can use the tooling to track and mitigate changes to the KMI.
The ABI monitoring tooling is developed in AOSP and uses libabigail to generate and compare representations.
This page describes the tooling, the process of collecting and analyzing ABI representations, and the usage of such representations to provide stability to the in-kernel ABI. This page also provides information for contributing changes to the Android kernels.
This directory
contains the specific tools for the ABI analysis. Use it with the build
scripts provided by build_abi.sh
.)
Process
Analyzing the kernel's ABI takes multiple steps, most of which can be automated:
- Acquire the toolchain, build scripts, and kernel sources through
repo
. - Provide any prerequisites (such as the
libabigail
library and collection of tools). - Build the kernel and its ABI representation.
- Analyze ABI differences between the build and a reference.
- Update the ABI representation (if required).
- Work with symbol lists.
The following instructions work for any
kernel that you can build using a
supported toolchain (such as the prebuilt Clang toolchain). repo manifests
are available for all Android common kernel branches and for several
device-specific kernels, they ensure that the correct toolchain is used when you
build a kernel distribution for analysis.
Using the ABI Monitoring tooling
1. Acquire the toolchain, build scripts, and kernel sources through repo
You can acquire the toolchain, build scripts (these scripts), and kernel sources
with repo
. For detailed documentation, refer to the corresponding information
for building Android kernels.
To illustrate the process, the following steps use common-android12-5.10
, an
Android kernel branch, which is the latest released GKI kernel at the time of this
writing. To obtain this branch through repo
, execute the following:
repo init -u https://android.googlesource.com/kernel/manifest -b common-android12-5.10
repo sync
2. Provide prerequisites
The ABI tooling uses libabigail, a library and collection of tools, to analyze
binaries. A suitable set of prebuilt binaries comes with the
kernel-build-tools
and is automatically used with build_abi.sh
.
To utilize the lower-level tooling (such as dump_abi
), add the kernel-build-
tools to the PATH
.
3. Build the kernel and its ABI representation
At this point you're ready to build a kernel with the correct toolchain and to
extract an ABI representation from its binaries (vmlinux
+ modules).
Similar to the usual Android kernel build process (using build.sh
), this step
requires running build_abi.sh
.
BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh
That builds the kernel and extracts the ABI representation into the out_abi
subdirectory. In this case out/android12-5.10/dist/abi.xml
is a symbolic
link to out_abi/android12-5.10/dist/abi-<id>.xml
. <id>
is computed by
executing git describe
against the kernel source tree.
4. Analyze the ABI differences between the build and a reference representation
build_abi.sh
analyzes and reports any ABI differences when a reference is
provided through the environment variable ABI_DEFINITION
. ABI_DEFINITION
must point to a reference file relative to the kernel source tree, and can be
specified on the command line or, more commonly, as a value in build.config.
The following provides an example:
BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh
In the command above, build.config.gki.aarch64
defines the reference file (as
ABI_DEFINITION=android/abi_gki_aarch64.xml
), and diff_abi
calls abidiff
to
compare the freshly generated ABI representation against the reference file.
build_abi.sh
prints the location of the report and emits a short report for
any ABI breakage. If breakages are detected, build_abi.sh
terminates and
returns a nonzero exit code.
5. Update the ABI representation (if required)
To update the ABI representation, invoke build_abi.sh
with the --update
flag. It updates the corresponding abi.xml
file that's defined by
build.config
. To print the ABI differences due to the update, invoke the
script with --print-report
. Be sure to include the report in the commit
message when updating the abi.xml
file.
6. Working with symbol lists
Parameterize build_abi.sh
with KMI symbol lists to
filter symbols during ABI extraction. These are plain text files that list
relevant ABI kernel symbols. For example, a symbol list file with the following
content limits ABI analysis to the ELF symbols with the names symbol1
and
symbol2
:
[abi_symbol_list]
symbol1
symbol2
Changes to other ELF symbols aren't considered. A symbol list file can be
specified in the corresponding build.config
configuration file with
KMI_SYMBOL_LIST=
as a file relative to the kernel source directory
($KERNEL_DIR
). To provide a level of organization, you can specify additional
symbol list files by using ADDITIONAL_KMI_SYMBOL_LISTS=
in the build.config
file. This specifies further symbol list files, relative to $KERNEL_DIR
;
separate multiple filenames by whitespace.
To create an initial symbol list or to update an existing one, you must use
the build_abi.sh
script with the --update-symbol-list
parameter.
When the script is run with an appropriate configuration, it builds the kernel
and extracts the symbols that are exported from vmlinux
and GKI modules and
that are required by any other module in the tree.
Consider vmlinux
exporting the following symbols (usually done via the
EXPORT_SYMBOL*
macros):
func1
func2
func3
Also, imagine there were two vendor modules, modA.ko
and modB.ko
, which
require the following symbols (in other words, they list undefined
symbol
entries in their symbol table):
modA.ko: func1 func2
modB.ko: func2
From an ABI stability point of view, func1
and func2
must be kept stable, as
they're used by an external module. On the contrary, while func3
is exported,
it isn't actively used (in other words, it's not required) by any module. Thus,
the symbol list contains func1
and func2
only.
To create or update an existing symbol list, build_abi.sh
must be run as
follows:
BUILD_CONFIG=path/to/build.config.device build/build_abi.sh --update-symbol-list
In this example, build.config.device
must include several configuration
options:
vmlinux
must be in theFILES
list.KMI_SYMBOL_LIST
must be set and pointing at the KMI symbol list to update.GKI_MODULES_LIST
must be set and pointing at the list of GKI modules. This path is usuallyandroid/gki_aarch64_modules
.
Working with the lower-level ABI tooling
Most users will only need to use build_abi.sh
. In some cases, working directly
with the lower-level ABI tooling might be necessary. The two commands used by
build_abi.sh
, dump_abi
and diff_abi
, are available to extract and compare
ABI files. See the following sections for their usages.
Creating ABI representations from kernel trees
Provided a linux kernel tree with built vmlinux
and kernel modules, the tool
dump_abi
creates an ABI representation using the selected ABI tool. A sample
invocation looks like this:
dump_abi --linux-tree path/to/out --out-file /path/to/abi.xml
The file abi.xml
contains a textual ABI representation of the combined,
observable ABI of vmlinux
and the kernel modules in the given directory.
This file might be used for manual inspection, further analysis, or as a
reference file to enforce ABI stability.
Comparing ABI representations
ABI representations created by dump_abi
can be compared with diff_abi
. Use
the same abi-tool for both dump_abi
and diff_abi
. A sample invocation looks
like this:
diff_abi --baseline abi1.xml --new abi2.xml --report report.out
The generated report lists detected ABI changes that affect the KMI. The files
specified as baseline
and new
are ABI representations
that were collected with dump_abi
. diff_abi
propagates the exit code of the
underlying tool and therefore returns a non-zero value when the ABIs compared
are incompatible.
Using KMI symbol lists
To filter representations created with dump_abi
or to filter symbols compared
with diff_abi
, use the parameter --kmi-symbol-list
, that takes a path to a
KMI symbol list file:
dump_abi --linux-tree path/to/out --out-file /path/to/abi.xml --kmi-symbol-list /path/to/symbol_list_file
Dealing with ABI breakages
As an example, the following patch introduces a very obvious ABI breakage:
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5ed8f6292a53..f2ecb34c7645 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -339,6 +339,7 @@ struct core_state {
struct kioctx_table;
struct mm_struct {
struct {
+ int dummy;
struct vm_area_struct *mmap; /* list of VMAs */
struct rb_root mm_rb;
u64 vmacache_seqnum; /* per-thread vmacache */
When you run build_abi.sh
again with this patch applied, the tooling exits with
a non-zero error code and reports an ABI difference similar to this:
Leaf changes summary: 1 artifact changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct mm_struct at mm_types.h:372:1' changed:
type size changed from 6848 to 6912 (in bits)
there are data member changes:
[...]
Fixing a broken ABI on Android Gerrit
If you didn't intentionally break the kernel ABI, then you need to investigate, using the guidance provided by the ABI monitoring tooling. The most common causes of breakages are added or deleted functions, changed data structures, or changes to the ABI caused by adding config options that lead to any of the aforementioned. Begin by addressing the issues found by the tool.
You can reproduce the ABI test locally by running the following command
with the same arguments that you would have used for running build/build.sh
:
This is an example command for the GKI kernels:
BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh
Updating the Kernel ABI
If you need to update the kernel ABI representation, then you must update the
corresponding abi.xml
file in the kernel source tree. The most convenient way
to do this is by using build/build_abi.sh
like so:
build/build_abi.sh --update --print-report
Use the same arguments that you would have used to run build/build.sh
. This
updates the correct abi.xml
in the source tree and prints the detected
differences. As a matter of practice, include the printed (short) report in the
commit message (at least partially).
Android Kernel Branches with predefined ABI
Some kernel branches come with predefined ABI representations for Android as part of
their source distribution. These ABI representations are intended to be
accurate, and to reflect the result of build_abi.sh
as if you would execute it
on your own. As the ABI is heavily influenced by various kernel configuration
options, these .xml
files usually belong to a certain configuration. For
example, the common-android12-5.10
branch contains an abi_gki_aarch64.xml
that corresponds to the build result when using the build.config.gki.aarch64
.
In particular, build.config.gki.aarch64
also refers to this file through
ABI_DEFINITION
.
Such predefined ABI representations are used as a baseline definition when
comparing with diff_abi
. For example, to validate
a kernel patch regarding any changes to the ABI, create the ABI representation
with the patch applied and use diff_abi
to compare it to the expected ABI for
that particular source tree or configuration. If ABI_DEFINITION
is set, running
build_abi.sh
accordingly will do.
Enforcing the KMI using module versioning
The Generic Kernel Image (GKI) kernels use module versioning
(CONFIG_MODVERSIONS
) as an additional measure to enforce KMI compliance at
runtime. Module versioning can cause cyclic redundancy check (CRC) mismatch
failures at module load time if the expected KMI of a module doesn't match the
vmlinux
KMI. For example, the following is a typical failure that occurs at
module load time due to a CRC mismatch for the symbol module_layout()
:
init: Loading module /lib/modules/kernel/.../XXX.ko with args ""
XXX: disagrees about version of symbol module_layout
init: Failed to insmod '/lib/modules/kernel/.../XXX.ko' with args ''
Uses of module versioning
Module versioning is useful for the following reasons:
Module versioning catches changes in data structure visibility. If modules change opaque data structures, that is, data structures that aren't part of the KMI, they break after future changes to the structure.
As an example, consider the
fwnode
field instruct device
. This field MUST be opaque to modules so that they can't make changes to fields ofdevice->fw_node
or make assumptions about its size.However, if a module includes
<linux/fwnode.h>
(directly or indirectly), then thefwnode
field in thestruct device
is no longer opaque to it. The module can then make changes todevice->fwnode->dev
ordevice->fwnode->ops
. This scenario is problematic for several reasons, stated as follows:It can break assumptions the core kernel code is making about its internal data structures.
If a future kernel update changes the
struct fwnode_handle
(the data type offwnode
), then the module no longer works with the new kernel. Moreover,abidiff
won't show any differences because the module is breaking the KMI by directly manipulating internal data structures in ways that can't be captured by only inspecting the binary representation.
A current module is deemed KMI-incompatible when it is loaded at a later date by a new kernel that’s incompatible. Module versioning adds a run-time check to avoid accidentally loading a module that isn't KMI-compatible with the kernel. This check prevents hard-to-debug runtime issues and kernel crashes that might result from an undetected incompatibility in the KMI.
abidiff
has limitations in identifying ABI differences in certain convoluted cases thatCONFIG_MODVERSIONS
can catch.
Enabling module versioning prevents all these issues.
Checking for CRC mismatches without booting the device
abidiff
compares and reports CRC mismatches between kernels. This tool enables
you to catch CRC mismatched at the same time as other ABI differences.
In addition, a full kernel build with CONFIG_MODVERSIONS
enabled generates a
Module.symvers
file as part of the normal build process. This file has one
line for every symbol exported by the kernel (vmlinux
) and the modules. Each
line consists of the CRC value, symbol name, symbol namespace, the vmlinux
or
module name that's exporting the symbol, and the export type (for example,
EXPORT_SYMBOL
versus EXPORT_SYMBOL_GPL
).
You can compare the Module.symvers
files between the GKI build and your build
to check for any CRC differences in the symbols exported by vmlinux
. If there
is a CRC value difference in any symbol exported by vmlinux
and that
symbol is used by one of the modules you load in your device, the module doesn't
load.
If you don't have all the build artifacts, but do have the vmlinux
files of
the GKI kernel and your kernel, you can compare the CRC values for a specific
symbol by running the following command on both the kernels and comparing the
output:
nm <path to vmlinux>/vmlinux | grep __crc_<symbol name>
For example, the following command checks the CRC value for the module_layout
symbol:
nm vmlinux | grep __crc_module_layout
0000000008663742 A __crc_module_layout
Resolving CRC mismatches
Use the following steps to resolve a CRC mismatch when loading a module:
Build the GKI kernel and your device kernel by prepending
KBUILD_SYMTYPES=1
to the command you use to build the kernel, as shown in the following command:KBUILD_SYMTYPES=1 BUILD_CONFIG=common/build.config.gki.aarch64 build/build.sh
This command generates a
.symtypes
file for each.o
file. When usingbuild_abi.sh,
theKBUILD_SYMTYPES=1
flag is implicitly set already.Find the
.c
file in which the symbol with CRC mismatch is exported, using the following command:cd common && git grep EXPORT_SYMBOL.*module_layout kernel/module.c:EXPORT_SYMBOL(module_layout);
The
.c
file has a corresponding.symtypes
file in the GKI, and your device kernel build artifacts. Locate the.c
file using the following commands:cd out/$BRANCH/common && ls -1 kernel/module.* kernel/module.o kernel/module.o.symversions kernel/module.symtypes
The following are the characteristics of the
.c
file:The format of the
.c
file is one (potentially very long) line per symbol.[s|u|e|etc]#
at the start of the line means the symbol is of data type[struct|union|enum|etc]
. For example:t#bool typedef _Bool bool
A missing
#
prefix in the start of the line indicates that the symbol is a function. For example:find_module s#module * find_module ( const char * )
Compare the two files and fix all the differences.
Case 1: Differences due to data type visibility
If one kernel keeps a symbol or data type opaque to the modules and the other
kernel doesn't, that difference appears between the .symtypes
files
of the two kernels. The .symtypes
file from one of the kernels has UNKNOWN
for a symbol and the .symtypes
file from the other kernel has an expanded view
of the symbol or data type.
For example, adding the following line to the
include/linux/device.h
file in your kernel causes CRC mismatches, one of which
is for module_layout()
:
#include <linux/fwnode.h>
Comparing the module.symtypes
for that symbol, exposes the following
differences:
$ diff -u <GKI>/kernel/module.symtypes <your kernel>/kernel/module.symtypes
--- <GKI>/kernel/module.symtypes
+++ <your kernel>/kernel/module.symtypes
@@ -334,12 +334,15 @@
...
-s#fwnode_handle struct fwnode_handle { UNKNOWN }
+s#fwnode_reference_args struct fwnode_reference_args { s#fwnode_handle * fwnode ; unsigned int nargs ; t#u64 args [ 8 ] ; }
...
If your kernel has a value of UNKNOWN
and the GKI kernel has the expanded view
of the symbol (very unlikely), then merge the latest Android Common Kernel into
your kernel so that you are using the latest GKI kernel base.
In most cases, the GKI kernel has a value of UNKNOWN
, but your kernel has the
internal details of the symbol because of changes made to your kernel. This is
because one of the files in your kernel added a #include
that isn't present in
the GKI kernel.
To identify the #include
that causes the difference, follow these steps:
Open the header file that defines the symbol or data type having this difference. For example, edit
include/linux/fwnode.h
for thestruct fwnode_handle
.Add the following code at the top of the header file:
#ifdef CRC_CATCH #error "Included from here" #endif
In the module's
.c
file that has a CRC mismatch, add the following as the first line before any of the#include
lines.#define CRC_CATCH 1
Compile your module. The resulting build-time error shows the chain of header file
#include
that led to this CRC mismatch. For example:In file included from .../drivers/clk/XXX.c:16:` In file included from .../include/linux/of_device.h:5: In file included from .../include/linux/cpu.h:17: In file included from .../include/linux/node.h:18: .../include/linux/device.h:16:2: error: "Included from here" #error "Included from here"
One of the links in this chain of
#include
is due to a change made in your kernel, that's missing in the GKI kernel.Identify the change, revert it in your kernel or upload it to ACK and get it merged.
Case 2: Differences due to data type changes
If the CRC mismatch for a symbol or data type isn't due to a difference in
visibility, then it's due to actual changes (additions, removals, or changes) in
the data type itself. Typically, abidiff
catches this, but if it misses any
due to known detection gaps, the MODVERSIONS
mechanism can catch them.
For example, making the following change in your kernel causes several CRC mismatches as many symbols are indirectly affected by this type of change:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -259,7 +259,7 @@ struct iommu_ops {
void (*iotlb_sync)(struct iommu_domain *domain);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
phys_addr_t (*iova_to_phys_hard)(struct iommu_domain *domain,
- dma_addr_t iova);
+ dma_addr_t iova, unsigned long trans_flag);
int (*add_device)(struct device *dev);
void (*remove_device)(struct device *dev);
struct iommu_group *(*device_group)(struct device *dev);
One CRC mismatch is for devm_of_platform_populate()
.
If you compare the .symtypes
files for that symbol, it might look like this:
$ diff -u <GKI>/drivers/of/platform.symtypes <your kernel>/drivers/of/platform.symtypes
--- <GKI>/drivers/of/platform.symtypes
+++ <your kernel>/drivers/of/platform.symtypes
@@ -399,7 +399,7 @@
...
-s#iommu_ops struct iommu_ops { ... ; t#phy
s_addr_t ( * iova_to_phys_hard ) ( s#iommu_domain * , t#dma_addr_t ) ; int
( * add_device ) ( s#device * ) ; ...
+s#iommu_ops struct iommu_ops { ... ; t#phy
s_addr_t ( * iova_to_phys_hard ) ( s#iommu_domain * , t#dma_addr_t , unsigned long ) ; int ( * add_device ) ( s#device * ) ; ...
To identify the changed type, follow these steps:
Find the definition of the symbol in the source code (usually in
.h
files).- For simple symbol differences between your kernel and the GKI kernel, find the commit by running the following command:
git blame
- For deleted symbols (where a symbol is deleted in a tree and you also want to delete it in the other tree), you need to find the change that deleted the line. Use the following command on the tree where the line was deleted:
git log -S "copy paste of deleted line/word" -- <file where it was deleted>
Review the returned list of commits to locate the change or deletion. The first commit is probably the one you are searching for. If it isn't, go through the list until you find the commit.
After you identify the change, either revert it in your kernel or upload it to ACK and get it merged.