Use the following guidelines to increase the robustness and reliability of your vendor modules. Many guidelines, when followed, can help make it easier to determine the correct module load order and the order in which drivers must probe for devices.
A module can be a library or a driver.
Library modules are libraries that provide APIs for other modules to use. Such modules typically aren't hardware-specific. Examples of library modules include an AES encryption module, the
remoteproc
framework that's compiled as a module, and a logbuffer module. The module code inmodule_init()
runs to set up data structures, but no other code runs unless triggered by an external module.Driver modules are drivers that probe for or bind to a specific type of device. Such modules are hardware-specific. Examples of driver modules include UART, PCIe, and video encoder hardware. Driver modules activate only when their associated device is present on the system.
If the device isn't present, the only module code that runs is the
module_init()
code that registers the driver with the driver core framework.If the device is present and the driver successfully probes for or binds to that device, other module code might run.
Use module init and exit correctly
Driver modules must register a driver in module_init()
and unregister a
driver in module_exit()
. One way to enforce these restrictions is to use
wrapper macros, which avoids the direct use of module_init()
, *_initcall()
,
or module_exit()
macros.
For modules that can be unloaded, use
module_subsystem_driver()
. Examples:module_platform_driver()
,module_i2c_driver()
, andmodule_pci_driver()
.For modules that can't be unloaded, use
builtin_subsystem_driver()
Examples:builtin_platform_driver()
,builtin_i2c_driver()
, andbuiltin_pci_driver()
.
Some driver modules use module_init()
and module_exit()
because they
register more than one driver. For a driver module that uses module_init()
and
module_exit()
to register multiple drivers, try to combine the drivers into a
single driver. For example, you could differentiate using the compatible
string or the aux data of the device instead of registering separate drivers.
Alternatively, you could split the driver module into two modules.
Init and exit function exceptions
Library modules don't register drivers and are exempt from restrictions on
module_init()
and module_exit()
as they might need these functions to set up
data structures, work queues, or kernel threads.
Use the MODULE_DEVICE_TABLE macro
Driver modules must include the MODULE_DEVICE_TABLE
macro, which allows the
user space to determine the devices supported by a driver module before loading
the module. Android can use this data to optimize module loading, such as to
avoid loading modules for devices that aren't present in the system. For
examples on using the macro, refer to the upstream code.
Avoid CRC mismatches due to forward-declared data types
Don't include header files to get visibility into forward-declared data types.
Some structs, unions, and other data types defined in a header file
(header-A.h
) can be forward declared in a different
header file (header-B.h
) that typically uses pointers to
those data types. This code pattern means that the kernel is intentionally
trying to keep the data structure private to the users of
header-B.h
.
Users of header-B.h
shouldn't include
header-A.h
to directly access the internals of these
forward-declared data structures. Doing so causes CONFIG_MODVERSIONS
CRC
mismatch issues (which generates ABI compliance issues) when a different kernel
(such as the GKI kernel) attempts to load the module.
For example, struct fwnode_handle
is defined in include/linux/fwnode.h
, but
is forward declared as struct fwnode_handle;
in include/linux/device.h
because the kernel is trying to keep the details of struct fwnode_handle
private from the users of include/linux/device.h
. In this scenario, don't
add #include <linux/fwnode.h>
in a module to gain access to members of
struct fwnode_handle
. Any design in which you have to include such header
files indicates a bad design pattern.
Don't directly access core kernel structures
Directly accessing or modifying core kernel data structures can lead to undesirable behavior, including memory leaks, crashes, and broken compatibility with future kernel releases. A data structure is a core kernel data structure when it meets any of the following conditions:
The data structure is defined under
KERNEL-DIR/include/
. For example,struct device
andstruct dev_links_info
. Data structures defined ininclude/linux/soc
are exempted.The data structure is allocated or initialized by the module but is made visible to the kernel by being passed, indirectly (through a pointer in a struct) or directly, as input in a function exported by the kernel. For example, a
cpufreq
driver module initializes thestruct cpufreq_driver
and then passes it as input tocpufreq_register_driver()
. After this point, thecpufreq
driver module shouldn't modifystruct cpufreq_driver
directly because callingcpufreq_register_driver()
makesstruct cpufreq_driver
visible to the kernel.The data structure isn't initialized by your module. For example,
struct regulator_dev
returned byregulator_register()
.
Access core kernel data structures only through functions exported by the
kernel or through parameters explicitly passed as input to vendor hooks. If you
don't have an API or vendor hook to modify parts of a core kernel data
structure, it's probably intentional and you shouldn't modify the data structure
from modules. For example, don't modify any fields inside struct device
or
struct device.links
.
To modify
device.devres_head
, use adevm_*()
function such asdevm_clk_get()
,devm_regulator_get()
, ordevm_kzalloc()
.To modify fields inside
struct device.links
, use a device link API such asdevice_link_add()
ordevice_link_del()
.
Don't parse devicetree nodes with compatible property
If a device tree (DT) node has a compatible
property, a struct device
is
allocated for it automatically or when of_platform_populate()
is called on
the parent DT node (typically by the device driver of the parent device). The
default expectation (except for some devices initialized early for the
scheduler) is that a DT node with a compatible
property has a struct device
and a matching device driver. All other exceptions are already handled by the
upstream code.
In addition, fw_devlink
(previously called of_devlink
) considers DT nodes
with the compatible
property to be devices with an allocated struct device
that's probed by a driver. If a DT node has a compatible
property but the
allocated struct device
isn't probed, fw_devlink
could block its consumer
devices from probing or could block sync_state()
calls from being called for
its supplier devices.
If your driver uses an of_find_*()
function (such as of_find_node_by_name()
or of_find_compatible_node()
) to directly find a DT node that has a
compatible
property and then parse that DT node, fix the module by writing a
device driver that can probe the device or remove the compatible
property
(possible only if it hasn't been upstreamed). To discuss alternatives, reach out
to the Android Kernel Team at kernel-team@android.com and be prepared to
justify your use cases.
Use DT phandles to look up suppliers
Refer to a supplier using a phandle (a reference or pointer to a DT node) in DT
whenever possible. Using standard DT bindings and phandles to refer to suppliers
enables fw_devlink
(previously of_devlink
) to automatically determine
inter-device dependencies by parsing the DT at runtime. The kernel can then
automatically probe devices in the correct order, removing the need for module
load ordering or MODULE_SOFTDEP()
.
Legacy scenario (no DT support in ARM kernel)
Previously, before DT support was added to ARM kernels, consumers such as touch
devices looked up suppliers such as regulators using globally unique strings.
For example, the ACME PMIC driver could register or advertise multiple
regulators (such as acme-pmic-ldo1
to acme-pmic-ldo10
) and a touch driver
could look up a regulator using regulator_get(dev, "acme-pmic-ldo10")
.
However, on a different board, the LDO8 might supply the touch device, creating
a cumbersome system where the same touch driver needs to determine the correct
look-up string for the regulator for each board that the touch device is used
in.
Current scenario (DT support in ARM kernel)
After DT support was added to ARM kernels, consumers can identify suppliers in
the DT by referring to the supplier's device tree node using a phandle.
Consumers can also name the resource based on what it's used for instead of who
supplies it. For example, the touch driver from the previous example could use
regulator_get(dev, "core")
and regulator_get(dev, "sensor")
to get the
suppliers that power the touch device's core and sensor. The associated DT for
such a device is similar to the following code sample:
touch-device {
compatible = "fizz,touch";
...
core-supply = <&acme_pmic_ldo4>;
sensor-supply = <&acme_pmic_ldo10>;
};
acme-pmic {
compatible = "acme,super-pmic";
...
acme_pmic_ldo4: ldo4 {
...
};
...
acme_pmic_ldo10: ldo10 {
...
};
};
Worst-of-both-worlds scenario
Some drivers ported from older kernels include legacy behavior in the DT that takes the worst part of the legacy scheme and forces it on the newer scheme that's meant to make things easier. In such drivers, the consumer driver reads the string to use for lookup using a device-specific DT property, the supplier uses another supplier-specific property to define the name to be used for registering the supplier resource, then the consumer and supplier continue using the same old scheme of using strings to look up the supplier. In this worst-of-both-worlds scenario:
The touch driver uses code similar to the following code:
str = of_property_read(np, "fizz,core-regulator"); core_reg = regulator_get(dev, str); str = of_property_read(np, "fizz,sensor-regulator"); sensor_reg = regulator_get(dev, str);
The DT uses code similar to the following:
touch-device { compatible = "fizz,touch"; ... fizz,core-regulator = "acme-pmic-ldo4"; fizz,sensor-regulator = "acme-pmic-ldo4"; }; acme-pmic { compatible = "acme,super-pmic"; ... ldo4 { regulator-name = "acme-pmic-ldo4" ... }; ... acme_pmic_ldo10: ldo10 { ... regulator-name = "acme-pmic-ldo10" }; };
Don't modify framework API errors
Framework APIs, such as regulator
, clocks
, irq
, gpio
, phys
, and
extcon
, return -EPROBE_DEFER
as an error return value to indicate that a
device is attempting to probe but can't at this time, and the kernel should
reattempt the probe later. To ensure that your device's .probe()
function
fails as expected in such cases, don't replace or remap the error value.
Replacing or remapping the error value might cause -EPROBE_DEFER
to be dropped
and result in your device never getting probed.
Use devm_*() API variants
When the device acquires a resource using a devm_*()
API, the resource is
automatically released by the kernel if the device fails to probe, or probes
successfully and is later unbound. This capability makes the error handling
code in the probe()
function cleaner because it doesn't require goto
jumps
to release the resources acquired by devm_*()
and simplifies driver unbinding
operations.
Handle device-driver unbinding
Be intentional about unbinding device drivers and don't leave the unbinding undefined because undefined doesn't imply disallowed. You must either fully implement device-driver unbinding or explicitly disable device-driver unbinding.
Implement device-driver unbinding
When choosing to fully implement device-driver unbinding, unbind device drivers
cleanly to avoid memory or resource leaks and security issues. You can bind a
device to a driver by calling a driver's probe()
function and unbind a device
by calling the driver's remove()
function. If no remove()
function exists,
the kernel can still unbind the device; the driver core assumes that no clean up
work is needed by the driver when it unbinds from the device. A driver that's
unbound from a device doesn't need to do any explicit clean up work when both of
the following are true:
All resources acquired by a driver's
probe()
function are throughdevm_*()
APIs.The hardware device doesn't need a shutdown or quiescing sequence.
In this situation, the driver core handles releasing all resources acquired
through devm_*()
APIs. If either of the preceding statements is untrue, the
driver needs to perform cleanup (release resources and shut down or
quiesce the hardware) when it unbinds from a device. To ensure that a device can
unbind a driver module cleanly, use one of the following options:
If the hardware doesn't need a shutdown or quiescing sequence, change the device module to acquire resources using
devm_*()
APIs.Implement the
remove()
driver operation in the same struct as theprobe()
function, then do the clean up steps using theremove()
function.
Explicitly disable device-driver unbinding (not recommended)
When choosing to explicitly disable device-driver unbinding, you need to disallow unbinding and disallow module unloading.
To disallow unbinding, set the
suppress_bind_attrs
flag totrue
in the driver'sstruct device_driver
; this setting prevents thebind
andunbind
files from showing in the driver'ssysfs
directory. Theunbind
file is what allows user space to trigger the unbinding of a driver from its device.To disallow module unloading, ensure the module has
[permanent]
inlsmod
. By not usingmodule_exit()
ormodule_XXX_driver()
, the module is marked as[permanent]
.
Don't load firmware from within the probe function
Driver shouldn't load firmware from within the .probe()
function as they might
not have access to the firmware if the driver probes before the flash or
permanent storage based file system is mounted. In such cases, the
request_firmware*()
API might block for a long time and then fail, which can
slow the boot process unnecessarily. Instead, defer the loading of the firmware
to when a client starts using the device. For example, a display driver could
load the firmware when the display device is opened.
Using .probe()
to load firmware might be OK in some cases, such as in a clock
driver that needs firmware to function but the device isn't exposed to user
space. Other appropriate use cases are possible.
Implement asynchronous probing
Support and use asynchronous probing to take advantage of future enhancements, such as parallel module loading or device probing to speed up boot time, that might be added to Android in future releases. Driver modules that don't use asynchronous probing could reduce the effectiveness of such optimizations.
To mark a driver as supporting and preferring asynchronous probing, set the
probe_type
field in the driver's struct device_driver
member. The following
example shows such support enabled for a platform driver:
static struct platform_driver acme_driver = {
.probe = acme_probe,
...
.driver = {
.name = "acme",
...
.probe_type = PROBE_PREFER_ASYNCHRONOUS,
},
};
Making a driver work with asynchronous probing doesn't require special code. However, keep the following in mind when adding asynchronous probing support.
Don't make assumptions about previously probed dependencies. Check directly or indirectly (most framework calls) and return
-EPROBE_DEFER
if one or more suppliers aren't ready yet.If you add child devices in a parent device's probe function, don't assume the child devices is probed immediately.
If a probe fails, perform proper error handling and clean up (see Use devm_*() API variants).
Don't use MODULE_SOFTDEP to order device probes
The MODULE_SOFTDEP()
function isn't a reliable solution for guaranteeing the
order of device probes and mustn't be used for the following reasons.
Deferred probe. When a module loads, the device probe might be deferred because one of its suppliers isn't ready. This can lead to a mismatch between the module load order and the device probe order.
One driver, many devices. A driver module can manage a specific device type. If the system includes more than one instance of a device type and those devices each have a different probe order requirement, you can't respect those requirements using module load ordering.
Asynchronous probing. Driver modules that perform asynchronous probing don't immediately probe a device when the module is loaded. Instead, a parallel thread handles device probing, which can lead to a mismatch between the module load order and the device probe order. For example, when an I2C main driver module performs asynchronous probing and a touch driver module depends on the PMIC that's on the I2C bus, even if the touch driver and the PMIC driver load in the correct order, the touch driver's probe might be attempted before the PMIC driver probe.
If you have driver modules using the MODULE_SOFTDEP()
function, fix them so
they don't use that function. To help you, the Android team has upstreamed
changes that enable the kernel to handle ordering problems without using
MODULE_SOFTDEP()
. Specifically, you can use fw_devlink
to ensure probe
ordering and (after all consumers of a device have probed) use the
sync_state()
callback to perform any necessary tasks.
Use #if IS_ENABLED() instead of #ifdef for configurations
Use #if IS_ENABLED(CONFIG_XXX)
instead of #ifdef CONFIG_XXX
to ensure that
the code inside the #if
block continues to compile if the config changes to a
tristate config in the future. The differences are as follows:
#if IS_ENABLED(CONFIG_XXX)
evaluates totrue
whenCONFIG_XXX
is set to module (=m
) or built-in (=y
).#ifdef CONFIG_XXX
evaluates totrue
whenCONFIG_XXX
is set to built-in (=y
) , but doesn't whenCONFIG_XXX
is set to module (=m
). Use this only when you're certain you want to do the same thing when the config is set to module or is disabled.
Use the correct macro for conditional compiles
If a CONFIG_XXX
is set to module (=m
), the build system automatically
defines CONFIG_XXX_MODULE
. If your driver is controlled by CONFIG_XXX
and
you want to check if your driver is being compiled as a module, use the
following guidelines:
In the C file (or any source file that isn't a header file) for your driver, don't use
#ifdef CONFIG_XXX_MODULE
as it's unnecessarily restrictive and breaks if the config is renamed toCONFIG_XYZ
. For any non-header source file that's compiled into a module, the build system automatically definesMODULE
for the scope of that file. Therefore, to check if a C file (or any non-header source file) is being compiled as part of a module, use#ifdef MODULE
(without theCONFIG_
prefix).In header files, the same check is tricker because header files aren't compiled directly into a binary but rather compiled as part of a C file (or other source files). Use the following rules for header files:
For a header file that uses
#ifdef MODULE
, the result changes based on which source file is using it. This means the same header file in the same build can have different parts of its code compiled for different source files (module versus built-in or disabled). This can be useful when you want to define a macro that needs to expand one way for built-in code and expand in a different way for a module.For a header file that needs to compile in a piece of code when a specific
CONFIG_XXX
is set to module (regardless of whether the source file including it is a module), the header file must use#ifdef CONFIG_XXX_MODULE
.