You can use application Binary Interface (ABI) Monitoring tooling, available in
Android 11 and higher, to stabilize the in-kernel
ABI of Android kernels. The tooling collects and compares ABI representations
from existing kernel binaries (vmlinux
+ modules). These ABI representations
are the .xml
files and the symbol lists. The interface on which the
representation gives a view is called the Kernel Module Interfaces (KMIs). You
can use the tooling to track and mitigate changes to the KMI.
The ABI monitoring tooling is developed in AOSP and uses libabigail to generate and compare representations.
This page describes the tooling, the process of collecting and analyzing ABI representations, and the usage of such representations to provide stability to the in-kernel ABI. This page also provides information for contributing changes to the Android kernels.
This directory
contains the specific tools for the ABI analysis. Use it with the build
scripts provided by build_abi.sh
.)
Process
Analyzing the kernel's ABI takes multiple steps, most of which can be automated:
- Acquire the toolchain, build scripts, and kernel sources through
repo
. - Provide any prerequisites (such as the
libabigail
library and collection of tools). - Build the kernel and its ABI representation.
- Analyze ABI differences between the build and a reference.
- Update the ABI representation (if required).
- Work with symbol lists.
The following instructions work for any
kernel that you can build using a
supported toolchain (such as the prebuilt Clang toolchain). repo manifests
are available for all Android common kernel branches and for several
device-specific kernels, they ensure that the correct toolchain is used when you
build a kernel distribution for analysis.
Using the ABI Monitoring tooling
1. Acquire the toolchain, build scripts, and kernel sources through repo
You can acquire the toolchain, build scripts (these scripts), and kernel sources
with repo
. For detailed documentation, refer to the corresponding information
for building Android kernels.
To illustrate the process, the following steps use common-android12-5.10
, an
Android kernel branch, which is the latest released GKI kernel at the time of this
writing. To obtain this branch through repo
, execute the following:
repo init -u https://android.googlesource.com/kernel/manifest -b common-android12-5.10
repo sync
2. Provide prerequisites
The ABI tooling uses libabigail, a library and collection of tools, to analyze
binaries. A suitable set of prebuilt binaries comes with the
kernel-build-tools
and is automatically used with build_abi.sh
.
To utilize the lower-level tooling (such as dump_abi
), add the kernel-build-
tools to the PATH
.
3. Build the kernel and its ABI representation
At this point you're ready to build a kernel with the correct toolchain and to
extract an ABI representation from its binaries (vmlinux
+ modules).
Similar to the usual Android kernel build process (using build.sh
), this step
requires running build_abi.sh
.
BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh
That builds the kernel and extracts the ABI representation into the out_abi
subdirectory. In this case out/android12-5.10/dist/abi.xml
is a symbolic
link to out_abi/android12-5.10/dist/abi-<id>.xml
. <id>
is computed by
executing git describe
against the kernel source tree.
4. Analyze the ABI differences between the build and a reference representation
build_abi.sh
analyzes and reports any ABI differences when a reference is
provided through the environment variable ABI_DEFINITION
. ABI_DEFINITION
must point to a reference file relative to the kernel source tree, and can be
specified on the command line or, more commonly, as a value in build.config.
The following provides an example:
BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh
In the command above, build.config.gki.aarch64
defines the reference file (as
ABI_DEFINITION=android/abi_gki_aarch64.xml
), and diff_abi
calls abidiff
to
compare the freshly generated ABI representation against the reference file.
build_abi.sh
prints the location of the report and emits a short report for
any ABI breakage. If breakages are detected, build_abi.sh
terminates and
returns a nonzero exit code.
5. Update the ABI representation (if required)
To update the ABI representation, invoke build_abi.sh
with the --update
flag. It updates the corresponding abi.xml
file that's defined by
build.config
. To print the ABI differences due to the update, invoke the
script with --print-report
. Be sure to include the report in the commit
message when updating the abi.xml
file.
6. Working with symbol lists
Parameterize build_abi.sh
with KMI symbol lists to
filter symbols during ABI extraction. These are plain text files that list
relevant ABI kernel symbols. For example, a symbol list file with the following
content limits ABI analysis to the ELF symbols with the names symbol1
and
symbol2
:
[abi_symbol_list]
symbol1
symbol2
Changes to other ELF symbols aren't considered. A symbol list file can be
specified in the corresponding build.config
configuration file with
KMI_SYMBOL_LIST=
as a file relative to the kernel source directory
($KERNEL_DIR
). To provide a level of organization, you can specify additional
symbol list files by using ADDITIONAL_KMI_SYMBOL_LISTS=
in the build.config
file. This specifies further symbol list files, relative to $KERNEL_DIR
;
separate multiple filenames by whitespace.
To create an initial symbol list or to update an existing one, you must use
the build_abi.sh
script with the --update-symbol-list
parameter.
When the script is run with an appropriate configuration, it builds the kernel
and extracts the symbols that are exported from vmlinux
and GKI modules and
that are required by any other module in the tree.
Consider vmlinux
exporting the following symbols (usually done via the
EXPORT_SYMBOL*
macros):
func1
func2
func3
Also, imagine there were two vendor modules, modA.ko
and modB.ko
, which
require the following symbols (in other words, they list undefined
symbol
entries in their symbol table):
modA.ko: func1 func2
modB.ko: func2
From an ABI stability point of view, func1
and func2
must be kept stable, as
they're used by an external module. On the contrary, while func3
is exported,
it isn't actively used (in other words, it's not required) by any module. Thus,
the symbol list contains func1
and func2
only.
To create or update an existing symbol list, build_abi.sh
must be run as
follows:
BUILD_CONFIG=path/to/build.config.device build/build_abi.sh --update-symbol-list
In this example, build.config.device
must include several configuration
options:
vmlinux
must be in theFILES
list.KMI_SYMBOL_LIST
must be set and pointing at the KMI symbol list to update.GKI_MODULES_LIST
must be set and pointing at the list of GKI modules. This path is usuallyandroid/gki_aarch64_modules
.
Working with the lower-level ABI tooling
Most users will only need to use build_abi.sh
. In some cases, working directly
with the lower-level ABI tooling might be necessary. The two commands used by
build_abi.sh
, dump_abi
and diff_abi
, are available to extract and compare
ABI files. See the following sections for their usages.
Creating ABI representations from kernel trees
Provided a linux kernel tree with built vmlinux
and kernel modules, the tool
dump_abi
creates an ABI representation using the selected ABI tool. A sample
invocation looks like this:
dump_abi --linux-tree path/to/out --out-file /path/to/abi.xml
The file abi.xml
contains a textual ABI representation of the combined,
observable ABI of vmlinux
and the kernel modules in the given directory.
This file might be used for manual inspection, further analysis, or as a
reference file to enforce ABI stability.
Comparing ABI representations
ABI representations created by dump_abi
can be compared with diff_abi
. Use
the same abi-tool for both dump_abi
and diff_abi
. A sample invocation looks
like this:
diff_abi --baseline abi1.xml --new abi2.xml --report report.out
The generated report lists detected ABI changes that affect the KMI. The files
specified as baseline
and new
are ABI representations
that were collected with dump_abi
. diff_abi
propagates the exit code of the
underlying tool and therefore returns a non-zero value when the ABIs compared
are incompatible.
Using KMI symbol lists
To filter representations created with dump_abi
or to filter symbols compared
with diff_abi
, use the parameter --kmi-symbol-list
, that takes a path to a
KMI symbol list file:
dump_abi --linux-tree path/to/out --out-file /path/to/abi.xml --kmi-symbol-list /path/to/symbol_list_file
Dealing with ABI breakages
As an example, the following patch introduces a very obvious ABI breakage:
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5ed8f6292a53..f2ecb34c7645 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -339,6 +339,7 @@ struct core_state {
struct kioctx_table;
struct mm_struct {
struct {
+ int dummy;
struct vm_area_struct *mmap; /* list of VMAs */
struct rb_root mm_rb;
u64 vmacache_seqnum; /* per-thread vmacache */
When you run build_abi.sh
again with this patch applied, the tooling exits with
a non-zero error code and reports an ABI difference similar to this:
Leaf changes summary: 1 artifact changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct mm_struct at mm_types.h:372:1' changed:
type size changed from 6848 to 6912 (in bits)
there are data member changes:
[...]
Fixing a broken ABI on Android Gerrit
If you didn't intentionally break the kernel ABI, then you need to investigate, using the guidance provided by the ABI monitoring tooling. The most common causes of breakages are added or deleted functions, changed data structures, or changes to the ABI caused by adding config options that lead to any of the aforementioned. Begin by addressing the issues found by the tool.
You can reproduce the ABI test locally by running the following command
with the same arguments that you would have used for running build/build.sh
:
This is an example command for the GKI kernels:
BUILD_CONFIG=common/build.config.gki.aarch64 build/build_abi.sh
Updating the Kernel ABI
If you need to update the kernel ABI representation, then you must update the
corresponding abi.xml
file in the kernel source tree. The most convenient way
to do this is by using build/build_abi.sh
like so:
build/build_abi.sh --update --print-report
Use the same arguments that you would have used to run build/build.sh
. This
updates the correct abi.xml
in the source tree and prints the detected
differences. As a matter of practice, include the printed (short) report in the
commit message (at least partially).
Android Kernel Branches with predefined ABI
Some kernel branches come with predefined ABI representations for Android as part of
their source distribution. These ABI representations are intended to be
accurate, and to reflect the result of build_abi.sh
as if you would execute it
on your own. As the ABI is heavily influenced by various kernel configuration
options, these .xml
files usually belong to a certain configuration. For
example, the common-android12-5.10
branch contains an abi_gki_aarch64.xml
that corresponds to the build result when using the build.config.gki.aarch64
.
In particular, build.config.gki.aarch64
also refers to this file through
ABI_DEFINITION
.
Such predefined ABI representations are used as a baseline definition when
comparing with diff_abi
. For example, to validate
a kernel patch regarding any changes to the ABI, create the ABI representation
with the patch applied and use diff_abi
to compare it to the expected ABI for
that particular source tree or configuration. If ABI_DEFINITION
is set, running
build_abi.sh
accordingly will do.
Enforcing the KMI using module versioning
The GKI kernels use module versioning
(CONFIG_MODVERSIONS
) to enforce KMI compliance at runtime. Module versioning
will cause CRC mismatch failures at module load-time if the expected KMI of a
module doesn't match the vmlinux
KMI. For example, here is a typical failure
that occurs at module load-time due to a CRC mismatch for the symbol
module_layout()
:
init: Loading module /lib/modules/kernel/.../XXX.ko with args ""
XXX: disagrees about version of symbol module_layout
init: Failed to insmod '/lib/modules/kernel/.../XXX.ko' with args ''
Module versioning uses
Module versioning is useful for many reasons:
- It catches changes in data structure visibility. If modules can change opaque data structures, such as data structures that aren't part of the KMI, modules will break after future changes to the structure.
- It adds a run-time check to avoid accidentally loading a module that isn't KMI-compatible with the kernel. (Such as when a current module is loaded at a later date by a new kernel that’s incompatible.) This is preferable to having hard-to-debug subsequent runtime issues or kernel crashes.
abidiff
has limitations in identifying ABI differences in certain convoluted cases thatCONFIG_MODVERSIONS
can catch.
As an example for (1), consider the fwnode
field in struct device
.
That field MUST be opaque to modules so that they cannot make changes to fields
of device->fw_node
or make assumptions about its size.
However, if a module includes <linux/fwnode.h>
(directly or indirectly), then
the fwnode
field in the struct device
stops being opaque to it. The module
can then make changes to device->fwnode->dev
or device->fwnode->ops
. That's
problematic for several reasons:
It can break assumptions the core kernel code is making about its internal data structures.
If a future kernel update changes the
struct fwnode_handle
(the data type offwnode
), then the module won't work with the new kernel. Moreover,abidiff
won't show any differences because the module is breaking the KMI by directly manipulating internal data structures in ways that can't be captured by only inspecting the binary representation.
Enabling module versioning prevents all these issues.
Checking for CRC mismatches without booting the device
Any kernel build with CONFIG_MODVERSIONS
enabled does generate a
Module.symvers
file as part of the build process. The file has one line
for every symbol exported by the vmlinux
and the modules. Each line
consists of the CRC value, symbol name, symbol namespace, vmlinux
or module name
that's exporting the symbol, and the export type (for example, EXPORT_SYMBOL
vs.
EXPORT_SYMBOL_GPL
).
You can compare the Module.symvers
files between the GKI build and your build
to check for any CRC differences in the symbols exported by vmlinux
. If there
is a CRC value difference in any symbol exported by vmlinux
AND it's used
by one of the modules you load in your device, the module won't load.
If you don't have all the build artifacts, but just have the vmlinux
files of the
GKI kernel and your kernel, you can compare the CRC values for a specific symbol
by running the following command on both the kernels, then comparing the output:
nm <path to vmlinux>/vmlinux | grep __crc_<symbol name>
For example, to check the CRC value for the module_layout
symbol,
nm vmlinux | grep __crc_module_layout
0000000008663742 A __crc_module_layout
Fixing CRC mismatch
If you get a CRC mismatch when loading the module, here is how to you fix it:
Build the GKI kernel and your device kernel, and add
KBUILD_SYMTYPES=1
in front of the command you use to build the kernel. Note, when usingbuild_abi.sh,
this is implicitly set already. This will generate a.symtypes
file for each.o
file. For example:KBUILD_SYMTYPES=1 BUILD_CONFIG=common/build.config.gki.aarch64 build/build.sh
Find the
.c
file in which the symbol with CRC mismatch is exported. For example:cd common && git grep EXPORT_SYMBOL.*module_layout kernel/module.c:EXPORT_SYMBOL(module_layout);
That
.c
file has a corresponding.symtypes
file in the GKI, and your device kernel build artifacts.cd out/$BRANCH/common && ls -1 kernel/module.* kernel/module.o kernel/module.o.symversions kernel/module.symtypes
a. The format of this file is one (potentially very long) line per symbol.
b.
[s|u|e|etc]#
at the start of the line means the symbol is of data type[struct|union|enum|etc]
. For example,t#bool typedef _Bool bool
.c. A missing
#
prefix in the start of the line indicates the symbol is a function. For example,find_module s#module * find_module ( const char * )
.Compare those two files and fix all the differences.
Case 1: Differences due to data type visibility
If one kernel keeps a symbol or data type opaque to the modules and the other
kernel doesn't, then it shows up as a difference between the .symtypes
files
of the two kernels. The .symtypes
file from one of the kernels has UNKNOWN
for a symbol and the .symtypes
file from the other kernel has an expanded view
of the symbol or data type.
For example, assume you add this line to include/linux/device.h
in your kernel:
#include <linux/fwnode.h>
That causes CRC mismatches, with one of them for module_layout()
. If you
compare the module.symtypes
for that symbol, it looks like this:
$ diff -u <GKI>/kernel/module.symtypes <your kernel>/kernel/module.symtypes
--- <GKI>/kernel/module.symtypes
+++ <your kernel>/kernel/module.symtypes
@@ -334,12 +334,15 @@
...
-s#fwnode_handle struct fwnode_handle { UNKNOWN }
+s#fwnode_reference_args struct fwnode_reference_args { s#fwnode_handle * fwnode ; unsigned int nargs ; t#u64 args [ 8 ] ; }
...
If your kernel has it as UNKNOWN
and the GKI kernel has the expanded view of
the symbol (very unlikely), then merge the latest Android Common Kernel into
your kernel so that you are using the latest GKI kernel base.
Almost always, the GKI kernel has it as UNKNOWN
, but your kernel has the
internal details of the symbol because of changes made to your kernel. This is
because one of the files in your kernel added a #include
that isn't present in
the GKI kernel.
To identify the #include
that causes the difference, follow these steps:
- Open the header file that defines the symbol or data type having this
difference. For example, edit
include/linux/fwnode.h
for thestruct fwnode_handle
. Add the following code at the top of the header file:
#ifdef CRC_CATCH #error "Included from here" #endif
Then in the module's
.c
file (the one that has a CRC mismatch), add the following as the first line before any of the#include
lines.#define CRC_CATCH 1
Now compile your module. You'll get a build-time error that shows the chain of header file
#include
that led to this CRC mismatch. For example:In file included from .../drivers/clk/XXX.c:16:` In file included from .../include/linux/of_device.h:5: In file included from .../include/linux/cpu.h:17: In file included from .../include/linux/node.h:18: .../include/linux/device.h:16:2: error: "Included from here" #error "Included from here"
One of the links in this chain of
#include
is due to a change done in your kernel, that's missing in the GKI kernel.Once you identify the change, revert it in your kernel or upload it to ACK and get it merged.
Case 2: Differences due to data type changes
If the CRC mismatch for a symbol or data type isn't due to a difference in
visibility, then it's due to actual changes (additions, removals, or changes) in
the data type itself. Typically, abidiff
catches this, but if it misses any
due to known detection gaps, the MODVERSIONS
mechanism can catch them.
For example, assume you make the following change in your kernel:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -259,7 +259,7 @@ struct iommu_ops {
void (*iotlb_sync)(struct iommu_domain *domain);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
phys_addr_t (*iova_to_phys_hard)(struct iommu_domain *domain,
- dma_addr_t iova);
+ dma_addr_t iova, unsigned long trans_flag);
int (*add_device)(struct device *dev);
void (*remove_device)(struct device *dev);
struct iommu_group *(*device_group)(struct device *dev);
That would cause a lot of CRC mismatches (as many symbols are indirectly
affected by this type of change) and one of them would be for devm_of_platform_populate()
.
If you compare the .symtypes
files for that symbol, it might look like this:
$ diff -u <GKI>/drivers/of/platform.symtypes <your kernel>/drivers/of/platform.symtypes
--- <GKI>/drivers/of/platform.symtypes
+++ <your kernel>/drivers/of/platform.symtypes
@@ -399,7 +399,7 @@
...
-s#iommu_ops struct iommu_ops { ... ; t#phy
s_addr_t ( * iova_to_phys_hard ) ( s#iommu_domain * , t#dma_addr_t ) ; int
( * add_device ) ( s#device * ) ; ...
+s#iommu_ops struct iommu_ops { ... ; t#phy
s_addr_t ( * iova_to_phys_hard ) ( s#iommu_domain * , t#dma_addr_t , unsigned long ) ; int ( * add_device ) ( s#device * ) ; ...
To identify the changed type, follow these steps:
- Find the definition of the symbol in the source code (usually in
.h
files). - If there's an obvious symbol difference between your kernel and the GKI
kernel, do a
git blame
to find the commit. Sometimes a symbol is deleted in one tree, and you want to delete it in the other tree. To find the change that deleted the line, run this command on the tree where the line was deleted:
a.
git log -S "copy paste of deleted line/word" -- <file where it was deleted>
b. You'll get a shortened list of commits. The first one is probably the one you are searching for. If it isn't, go through the list until you find the commit.
Once you identify the change, either revert it in your kernel or upload it to ACK and get it merged.