Boot Time Optimization

This page provides a set of tips, that you can select from, to improve boot time.

Strip debug symbols from modules

Similar to how debug symbols are stripped from the kernel on a production device, make sure you also strip the debug symbols from modules. Stripping debug symbols from modules helps boot time by reducing the following:

  • The time it takes to read the binaries from flash.
  • The time it takes to decompress the ramdisk.
  • The time it takes to load the modules.

Stripping debug symbol from modules may save several seconds during boot.

Symbol stripping is enabled by default in the Android platform build, but to explicitly enable them, set BOARD_DO_NOT_STRIP_VENDOR_RAMDISK_MODULES in your device-specific config under device/vendor/device.

Use LZ4 compression for kernel and ramdisk

Gzip generates a smaller compressed output compared to LZ4, but LZ4 decompresses faster than Gzip. For the kernel and modules, the absolute storage size reduction from using Gzip isn't that significant compared to the decompression time benefit of LZ4.

Support for LZ4 ramdisk compression has been added to the Android platform build through BOARD_RAMDISK_USE_LZ4. You can set this option in your device-specific config. Kernel compression can be set through kernel defconfig.

Switching to LZ4 should give 500ms to 1000ms faster boot time.

Avoid excessive logging in your drivers

In ARM64 and ARM32, function calls that are more than a specific distance from the call site need a jump table (called a procedure linking table, or PLT) to be able to encode the full jump address. Since modules are loaded dynamically, these jump tables need to be fixed up during module load. The calls that need relocation are called relocation entries with explicit addends (or RELA, for short) entries in the ELF format.

The Linux kernel does some memory size optimization (such as cache hit optimization) when allocating the PLT. With this upstream commit, the optimization scheme has an O(N^2) complexity, where N is the number of RELAs of type R_AARCH64_JUMP26 or R_AARCH64_CALL26. So having fewer RELAs of these types is helpful in reducing the module load time.

One common coding pattern that increases the number of R_AARCH64_CALL26 or R_AARCH64_JUMP26 RELAs is excessive logging in a driver. Each call to printk() or any other logging scheme typically adds a CALL26/JUMP26 RELA entry. In the commit text in the upstream commit, ,notice that even with the optimization, the six modules take about 250ms to load—that is because those six modules were the top six modules with the most amount of logging.

Reducing logging can save can save about 100 - 300ms on boot times depending on how excessive the existing logging is.

Enable asynchronous probing, selectively

When a module is loaded, if the device that it supports has already been populated from the DT (devicetree) and added to driver core, then the device probe is done in the context of the module_init() call. When a device probe is done in the context of module_init(), the module can't finish loading until the probe completes. Since module loading is mostly serialized, a device that takes a relatively long time to probe slows the boot time.

To avoid slower boot times, enable asynchronous probing for modules that take a while to probe their devices. Enabling asynchronous probing for all modules might not be beneficial as the time it takes to fork a thread and kick off the probe might be as high as the time it takes to probe the device.

Devices that are connected through a slow bus such as I2C, devices that do firmware loading in their probe function, and devices that do a lot of hardware initialization can lead to the timing issue. The best way to identify when this happens is to collect the probe time for every driver and sort it.

To enable asynchronous probing for a module, it isn't sufficient to only set the PROBE_PREFER_ASYNCHRONOUS flag in the driver code. For modules, you also need to add module_name.async_probe=1 in the kernel command line or pass async_probe=1 as a module parameter when loading the module using modprobe or insmod.

Enabling asynchronous probing can save about 100 - 500ms on boot times depending on your hardware/drivers.

Probe your CPUfreq driver as early as possible

The earlier your CPUfreq driver probes, the sooner you can scale the CPU frequency to maximum (or some thermally limited maximum) during boot. The faster the CPU, the faster the boot. This guideline also applies to devfreq drivers that control the DRAM, memory, and interconnect frequency.

With modules, the load ordering can depend on the initcall level and compile or link order of the drivers. Use an alias MODULE_SOFTDEP() to make sure the cpufreq driver is among the first few modules to load.

Apart from loading the module early, you also need to make sure all the dependencies to probe the CPUfreq driver have also probed. For example, if you need a clock or regulator handle to control the frequency of your CPU, make sure they are probed first. Or you might need thermal drivers to be loaded before the CPUfreq driver if it is possible for your CPUs to get too hot during boot up. So, do what you can to make sure the CPUfreq and relevant devfreq drivers probe as early as possible.

The savings from probing your CPUfreq driver early can be very small to very large depending on how early you can get these to probe and at what frequency the bootloader leaves the CPUs in.

Move modules to second stage init, vendor or vendor_dlkm partition

Because the first stage init process is serialized, there aren't many opportunities to parallelize the boot process. If a module isn't needed for first stage init to finish, move the module to second stage init by placing it in the vendor or vendor_dlkm partition.

First stage init doesn't require probing several devices to get to second stage init. Only console and flash storage functionality are needed for a normal boot flow.

Load the following essential drivers:

  • watchdog
  • reset
  • cpufreq

For recovery and user space fastbootd mode, first stage init requires more devices to probe (such as USB), and display. Keep a copy of these modules in the first stage ramdisk and in the vendor or vendor_dlkm partition. This allows them to be loaded in first stage init for recovery or fastbootd boot flow. However, don't load the recovery mode modules in first stage init during normal boot flow. Recovery mode modules can be deferred to second stage init to decrease the boot time. All other modules that aren't needed in first stage init should be moved to the vendor or vendor_dlkm partition.

Given a list of leaf devices (for example, the UFS or serial), dev needs.sh script finds all drivers, devices, and modules needed for dependencies or suppliers (for example, clocks, regulators, or gpio) to probe.

Moving modules to second stage init decreases boot times in the following ways:

  • Ramdisk size reduction.
    • This yields faster flash reads when the bootloader loads the ramdisk (serialized boot step).
    • This yields faster decompression speeds when the kernel decompresses the ramdisk (serialized boot step).
  • Second stage init works in parallel, which hides the module's loading time with the work being done in second stage init.

Moving modules to second stage can save 500 - 1000ms on boot times depending on how many modules you're able to move to second stage init.

Module loading logistics

The latest Android build features board configurations that control which modules copy over to each stage, and which modules load. This section focuses on the following subset:

  • BOARD_VENDOR_RAMDISK_KERNEL_MODULES. This list of modules to be copied into the ramdisk.
  • BOARD_VENDOR_RAMDISK_KERNEL_MODULES_LOAD. This list of modules to be loaded in first stage init.
  • BOARD_VENDOR_RAMDISK_RECOVERY_KERNEL_MODULES_LOAD. This list of modules to be loaded when recovery or fastbootd is selected from the ramdisk.
  • BOARD_VENDOR_KERNEL_MODULES. This list of modules to be copied into the vendor or vendor_dlkm partition at /vendor/lib/modules/ directory.
  • BOARD_VENDOR_KERNEL_MODULES_LOAD. This list of modules to be loaded in second stage init.

The boot and recovery modules in ramdisk must also be copied to the vendor or vendor_dlkm partition at /vendor/lib/modules. Copying these modules to the vendor partition ensures the modules aren't invisible during second stage init, which is useful for debugging and collecting modinfo for bugreports.

The duplication should cost minimal space on the vendor or vendor_dlkm partition as long as the boot module set is minimized. Make sure that the vendor's modules.list file has a filtered list of modules in /vendor/lib/modules. The filtered list ensures boot times aren't affected by the modules loading again (which is an expensive process).

Ensure that recovery mode modules load as a group. Loading recovery mode modules can be done either in recovery mode, or at the beginning of the second stage init in each boot flow.

You can use the device Board.Config.mk files to perform these actions as seen in the following example:

# All kernel modules
KERNEL_MODULES := $(wildcard $(KERNEL_MODULE_DIR)/*.ko)
KERNEL_MODULES_LOAD := $(strip $(shell cat $(KERNEL_MODULE_DIR)/modules.load)

# First stage ramdisk modules
BOOT_KERNEL_MODULES_FILTER := $(foreach m,$(BOOT_KERNEL_MODULES),%/$(m))

# Recovery ramdisk modules
RECOVERY_KERNEL_MODULES_FILTER := $(foreach m,$(RECOVERY_KERNEL_MODULES),%/$(m))
BOARD_VENDOR_RAMDISK_KERNEL_MODULES += \
     $(filter $(BOOT_KERNEL_MODULES_FILTER) \
                $(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES))

# ALL modules land in /vendor/lib/modules so they could be rmmod/insmod'd,
# and modules.list actually limits us to the ones we intend to load.
BOARD_VENDOR_KERNEL_MODULES := $(KERNEL_MODULES)
# To limit /vendor/lib/modules to just the ones loaded, use:
# BOARD_VENDOR_KERNEL_MODULES := $(filter-out \
#     $(BOOT_KERNEL_MODULES_FILTER),$(KERNEL_MODULES))

# Group set of /vendor/lib/modules loading order to recovery modules first,
# then remainder, subtracting both recovery and boot modules which are loaded
# already.
BOARD_VENDOR_KERNEL_MODULES_LOAD := \
        $(filter-out $(BOOT_KERNEL_MODULES_FILTER), \
        $(filter $(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD)))
BOARD_VENDOR_KERNEL_MODULES_LOAD += \
        $(filter-out $(BOOT_KERNEL_MODULES_FILTER) \
            $(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD))

# NB: Load order governed by modules.load and not by $(BOOT_KERNEL_MODULES)
BOARD_VENDOR_RAMDISK_KERNEL_MODULES_LOAD := \
        $(filter $(BOOT_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD))

# Group set of /vendor/lib/modules loading order to boot modules first,
# then the remainder of recovery modules.
BOARD_VENDOR_RAMDISK_RECOVERY_KERNEL_MODULES_LOAD := \
    $(filter $(BOOT_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD))
BOARD_VENDOR_RAMDISK_RECOVERY_KERNEL_MODULES_LOAD += \
    $(filter-out $(BOOT_KERNEL_MODULES_FILTER), \
    $(filter $(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD)))

This example showcases an easier-to-manage subset of BOOT_KERNEL_MODULES and RECOVERY_KERNEL_MODULES to be specified locally in the board configuration files. The preceding script finds and fills each of the subset modules from the selected available kernel modules, leaving the reamining modules for second stage init.

For second stage init, we recommend running the module loading as a service so it doesn't block boot flow. Use a shell script to manage the module loading so that other logistics, such as error handling and mitigation, or module load completion, can be reported back (or ignored) if necessary.

You can ignore a debug module load failure that isn't present on user builds. To ignore this failure, set the vendor.device.modules.ready property to trigger later stages of init rc scripting bootflow to continue onto the launch screen. Reference the following example script, if you have the following code in /vendor/etc/init.insmod.sh:

#!/vendor/bin/sh
. . .
if [ $# -eq 1 ]; then
  cfg_file=$1
else
  # Set property even if there is no insmod config
  # to unblock early-boot trigger
  setprop vendor.common.modules.ready
  setprop vendor.device.modules.ready
  exit 1
fi

if [ -f $cfg_file ]; then
  while IFS="|" read -r action arg
  do
    case $action in
      "insmod") insmod $arg ;;
      "setprop") setprop $arg 1 ;;
      "enable") echo 1 > $arg ;;
      "modprobe") modprobe -a -d /vendor/lib/modules $arg ;;
     . . .
    esac
  done < $cfg_file
fi

In the hardware rc file, the one shot service could be specified with:

service insmod-sh /vendor/etc/init.insmod.sh /vendor/etc/init.insmod.<hw>.cfg
    class main
    user root
    group root system
    Disabled
    oneshot

Additional optimizations can be made after modules move from the first to second stage. You can use the modprobe blocklist feature to split up the second stage boot flow to include deferred module loading of nonessential modules. Loading of modules used exclusively by a specific HAL can be deferred to load the modules only when the HAL is started.

To improve apparent boot times, you can specifically choose modules in the module loading service that are more conducive to loading after the launch screen. For example, you can explicitly late load the modules for video decoder or wifi after the init boot flow has been cleared (sys.boot_complete Android property signal, for example). Make sure the HALs for the late loading modules block long enough when the kernel drivers aren't present.

Alternatively, you can use init's wait<file>[<timeout>] command in the boot flow rc scripting to wait for select sysfs entries to show that driver modules have completed the probe operations. An example of this is waiting for the display driver to complete loading in the background of recovery or fastbootd, before presenting menu graphics.

Initialize the CPU frequency to a reasonable value in the bootloader

Not all SoCs/products might be able to boot the CPU at the highest frequency due to thermal or power concerns during boot loop tests. However, make sure the bootloader sets the frequency of all the online CPUs to as high as safely possible for a SoC/product. This is very important because, with a fully modular kernel, the init ramdisk decompression takes place before the CPUfreq driver can be loaded. So, if the CPU is left at the lower end of its frequency by the bootloader, the ramdisk decompression time can take longer than a statically compiled kernel (after adjusting for ramdisk size difference) because the CPU frequency would be very low when doing CPU intensive work (decompression). The same applies to memory/interconnect frequency.

Initialize CPU frequency of big CPUs in the bootloader

Before the CPUfreq driver is loaded, the kernel is unaware of the little and big CPU frequencies and doesn't scale the CPUs’ sched capacity for their current frequency. The kernel might migrate threads to the big CPU if the load is sufficiently high on the little CPU.

Make sure the big CPUs are at least as performant as the little CPUs for the frequency at which the bootloader leaves them in. For example, if the big CPU is 2x as performant as the little CPU for the same frequency, but the bootloader sets the little CPU’s frequency to 1.5 GHz and the big CPU’s frequency to 300 MHz, then the boot performance is going to drop if the kernel moves a thread to the big CPU. In this example, if it is safe to boot the big CPU at 750 MHz, you should do so even if you do not plan to explicitly use it.

Drivers should not load firmware in first stage init

There might be some unavoidable cases where firmware needs to be loaded in first stage init. But in general, drivers should not load any firmware in first stage init, especially in device probe context. Loading firmware in first stage init causes the entire boot process to stall if the firmware is not available in the first stage ramdisk. And even if the firmware is present in the first stage ramdisk, it still causes an unnecessary delay.