A/B (Seamless) System Updates

A/B system updates, also known as seamless updates, ensure a workable booting system remains on the disk during an over-the-air (OTA) update. This approach reduces the likelihood of an inactive device after an update, which means fewer device replacements and device reflashes at repair and warranty centers. Other commercial-grade operating systems such as ChromeOS also use A/B updates successfully.

A/B system updates provide the following benefits:

  • OTA updates can occur while the system is running, without interrupting the user (including app optimizations that occur after a reboot). This means users can continue to use their devices during an OTA—the only downtime during an update is when the device reboots into the updated disk partition.
  • If an OTA fails, the device boots into the pre-OTA disk partition and remains usable. The download of the OTA can be attempted again.
  • Any errors (such as I/O errors) affect only the unused partition set and can be retried. Such errors also become less likely because the I/O load is deliberately low to avoid degrading the user experience.
  • Updates can be streamed to A/B devices, removing the need to download the package before installing it. Streaming means it's not necessary for the user to have enough free space to store the update package on /data or /cache.
  • The cache partition is no longer used to store OTA update packages, so there is no need for sizing the cache partition.
  • dm-verity guarantees a device will boot an uncorrupted image. If a device doesn't boot due to a bad OTA or dm-verity issue, the device can reboot into an old image. (Android Verified Boot does not require A/B updates.)

About A/B system updates

A/B system updates affect the following:

  • Partition selection (slots), the update_engine daemon, and bootloader interactions (described below)
  • Build process and OTA update package generation (described in Implementing A/B Updates)

Partition selection (slots)

A/B system updates use two sets of partitions referred to as slots (normally slot A and slot B). The system runs from the current slot while the partitions in the unused slot are not accessed by the running system during normal operation. This approach makes updates fault resistant by keeping the unused slot as a fallback: If an error occurs during or immediately after an update, the system can rollback to the old slot and continue to have a working system. To achieve this goal, no partition used by the current slot should be updated as part of the OTA update (including partitions for which there is only one copy).

Each slot has a bootable attribute that states whether the slot contains a correct system from which the device can boot. The current slot is bootable when the system is running, but the other slot may have an old (still correct) version of the system, a newer version, or invalid data. Regardless of what the current slot is, there is one slot that is the active slot (the one the bootloader will boot form on the next boot) or the preferred slot.

Each slot also has a successful attribute set by the user space, which is relevant only if the slot is also bootable. A successful slot should be able to boot, run, and update itself. A bootable slot that was not marked as successful (after several attempts were made to boot from it) should be marked as unbootable by the bootloader, including changing the active slot to another bootable slot (normally to the slot running immediately before the attempt to boot into the new, active one). The specific details of the interface are defined in boot_control.h.

Update engine daemon

A/B system updates use a background daemon called update_engine to prepare the system to boot into a new, updated version. This daemon can perform the following actions:

  • Read from the current slot A/B partitions and write any data to the unused slot A/B partitions as instructed by the OTA package.
  • Call the boot_control interface in a pre-defined workflow.
  • Run a post-install program from the new partition after writing all the unused slot partitions, as instructed by the OTA package. (For details, see Post-installation).

As the update_engine daemon is not involved in the boot process itself, it is limited in what it can do during an update by the SELinux policies and features in the current slot (such policies and features can't be updated until the system boots into a new version). To maintain a robust system, the update process should not modify the partition table, the contents of partitions in the current slot, or the contents of non-A/B partitions that can't be wiped with a factory reset.

The update_engine source is located in system/update_engine. The A/B OTA dexopt files are split between installd and a package manager:

For a working example, refer to /device/google/marlin/device-common.mk.

Bootloader interactions

The boot_control HAL is used by update_engine (and possibly other daemons) to instruct the bootloader what to boot from. Common example scenarios and their associated states include the following:

  • Normal case: The system is running from its current slot, either slot A or B. No updates have been applied so far. The system's current slot is bootable, successful, and the active slot.
  • Update in progress: The system is running from slot B, so slot B is the bootable, successful, and active slot. Slot A was marked as unbootable since the contents of slot A are being updated but not yet completed. A reboot in this state should continue booting from slot B.
  • Update applied, reboot pending: The system is running from slot B, slot B is bootable and successful, but slot A was marked as active (and therefore is marked as bootable). Slot A is not yet marked as successful and some number of attempts to boot from slot A should be made by the bootloader.
  • System rebooted into new update: The system is running from slot A for the first time, slot B is still bootable and successful while slot A is only bootable, and still active but not successful. A user space daemon, update_verifier, should mark slot A as successful after some checks are made.

Streaming update support

User devices don't always have enough space on /data to download the update package. As neither OEMs nor users want to waste space on a /cache partition, some users go without updates because the device has nowhere to store the update package. To address this issue, Android 8.0 added support for streaming A/B updates that write blocks directly to the B partition as they are downloaded, without having to store the blocks on /data. Streaming A/B updates need almost no temporary storage and require just enough storage for roughly 100 KiB of metadata.

To enable streaming updates in Android 7.1, cherrypick the following patches:

These patches are required to support streaming A/B updates in Android 7.1 and later whether using Google Mobile Services (GMS) or any other update client.

Life of an A/B update

The update process starts when an OTA package (referred to in code as a payload) is available for downloading. Policies in the device may defer the payload download and application based on battery level, user activity, charging status, or other policies. In addition, because the update runs in the background, users might not know an update is in progress. All of this means the update process might be interrupted at any point due to policies, unexpected reboots, or user actions.

Optionally, metadata in the OTA package itself indicates the update can be streamed; the same package can also be used for non-streaming installation. The server may use the metadata to tell the client it's streaming so the client will hand off the OTA to update_engine correctly. Device manufacturers with their own server and client can enable streaming updates by ensuring the server identifies the update is streaming (or assumes all updates are streaming) and the client makes the correct call to update_engine for streaming. Manufacturers can use the fact that the package is of the streaming variant to send a flag to the client to trigger hand off to the framework side as streaming.

After a payload is available, the update process is as follows:

Step Activities
1 The current slot (or "source slot") is marked as successful (if not already marked) with markBootSuccessful().
2 The unused slot (or "target slot") is marked as unbootable by calling the function setSlotAsUnbootable(). The current slot is always marked as successful at the beginning of the update to prevent the bootloader from falling back to the unused slot, which will soon have invalid data. If the system has reached the point where it can start applying an update, the current slot is marked as successful even if other major components are broken (such as the UI in a crash loop) as it is possible to push new software to fix these problems.

The update payload is an opaque blob with the instructions to update to the new version. The update payload consists of the following:
  • Metadata. A relatively small portion of the update payload, the metadata contains a list of operations to produce and verify the new version on the target slot. For example, an operation could decompress a certain blob and write it to specific blocks in a target partition, or read from a source partition, apply a binary patch, and write to certain blocks in a target partition.
  • Extra data. As the bulk of the update payload, the extra data associated with the operations consists of the compressed blob or binary patch in these examples.
3 The payload metadata is downloaded.
4 For each operation defined in the metadata, in order, the associated data (if any) is downloaded to memory, the operation is applied, and the associated memory is discarded.
5 The whole partitions are re-read and verified against the expected hash.
6 The post-install step (if any) is run. In the case of an error during the execution of any step, the update fails and is re-attempted with possibly a different payload. If all the steps so far have succeeded, the update succeeds and the last step is executed.
7 The unused slot is marked as active by calling setActiveBootSlot(). Marking the unused slot as active doesn't mean it will finish booting. The bootloader (or system itself) can switch the active slot back if it doesn't read a successful state.
8 Post-installation (described below) involves running a program from the "new update" version while still running in the old version. If defined in the OTA package, this step is mandatory and the program must return with exit code 0; otherwise, the update fails.
9 After the system successfully boots far enough into the new slot and finishes the post-reboot checks, the now current slot (formerly the "target slot") is marked as successful by calling markBootSuccessful().

Post-installation

For every partition where a post-install step is defined, update_engine mounts the new partition into a specific location and executes the program specified in the OTA relative to the mounted partition. For example, if the post-install program is defined as usr/bin/postinstall in the system partition, this partition from the unused slot will be mounted in a fixed location (such as /postinstall_mount) and the /postinstall_mount/usr/bin/postinstall command is executed.

For post-installation to succeed, the old kernel must be able to:

  • Mount the new filesystem format. The filesystem type cannot change unless there's support for it in the old kernel, including details such as the compression algorithm used if using a compressed filesystem (i.e. SquashFS).
  • Understand the new partition's post-install program format. If using an Executable and Linkable Format (ELF) binary, it should be compatible with the old kernel (e.g. a 64-bit new program running on an old 32-bit kernel if the architecture switched from 32- to 64-bit builds). Unless the loader (ld) is instructed to use other paths or build a static binary, libraries will be loaded from the old system image and not the new one.

For example, you could use a shell script as a post-install program interpreted by the old system's shell binary with a #! marker at the top), then set up library paths from the new environment for executing a more complex binary post-install program. Alternatively, you could run the post-install step from a dedicated smaller partition to enable the filesystem format in the main system partition to be updated without incurring backward compatibility issues or stepping-stone updates; this would allow users to update directly to the latest version from a factory image.

The new post-install program is limited by the SELinux policies defined in the old system. As such, the post-install step is suitable for performing tasks required by design on a given device or other best-effort tasks (i.e. updating the A/B-capable firmware or bootloader, preparing copies of databases for the new version, etc.). The post-install step is not suitable for one-off bug fixes before reboot that require unforeseen permissions.

The selected post-install program runs in the postinstall SELinux context. All the files in the new mounted partition will be tagged with postinstall_file, regardless of what their attributes are after rebooting into that new system. Changes to the SELinux attributes in the new system won't impact the post-install step. If the post-install program needs extra permissions, those must be added to the post-install context.

After reboot

After rebooting, update_verifier triggers the integrity check using dm-verity. This check starts before zygote to avoid Java services making any irreversible changes that would prevent a safe rollback. During this process, bootloader and kernel may also trigger a reboot if verified boot or dm-verity detect any corruption. After the check completes, update_verifier marks the boot successful.

update_verifier will read only the blocks listed in /data/ota_package/care_map.txt, which is included in an A/B OTA package when using the AOSP code. The Java system update client, such as GmsCore, extracts care_map.txt, sets up the access permission before rebooting the device, and deletes the extracted file after the system successfully boots into the new version.

Frequently asked questions

Has Google used A/B OTAs on any devices?

Yes. The marketing name for A/B updates is seamless updates. Pixel and Pixel XL phones from October 2016 shipped with A/B, and all Chromebooks use the same update_engine implementation of A/B. The necessary platform code implementation is public in Android 7.1 and higher.

Why are A/B OTAs better?

A/B OTAs provide a better user experience when taking updates. Measurements from monthly security updates show this feature has already proven a success: As of May 2017, 95% of Pixel owners are running the latest security update after a month compared to 87% of Nexus users, and Pixel users update sooner than Nexus users. Failures to update blocks during an OTA no longer result in a device that won't boot; until the new system image has successfully booted, Android retains the ability to fall back to the previous working system image.

How did A/B affect the 2016 Pixel partition sizes?

The following table contains details on the shipping A/B configuration versus the internally-tested non-A/B configuration:

Pixel partition sizes A/B Non-A/B
Bootloader 50*2 50
Boot 32*2 32
Recovery 0 32
Cache 0 100
Radio 70*2 70
Vendor 300*2 300
System 2048*2 4096
Total 5000 4680

A/B updates require an increase of only 320 MiB in flash, with a savings of 32MiB from removing the recovery partition and another 100MiB preserved by removing the cache partition. This balances the cost of the B partitions for the bootloader, the boot partition, and the radio partition. The vendor partition doubled in size (the vast majority of the size increase). Pixel's A/B system image is half the size of the original non-A/B system image.

For the Pixel A/B and non-A/B variants tested internally (only A/B shipped), the space used differed by only 320MiB. On a 32GiB device, this is just under 1%. For a 16GiB device this would be less than 2%, and for an 8GiB device almost 4% (assuming all three devices had the same system image).

Why didn't you use SquashFS?

We experimented with SquashFS but weren't able to achieve the performance desired for a high-end device. We don't use or recommend SquashFS for handheld devices.

More specifically, SquashFS provided about 50% size savings on the system partition, but the overwhelming majority of the files that compressed well were the precompiled .odex files. Those files had very high compression ratios (approaching 80%), but the compression ratio for the rest of the system partition was much lower. In addition, SquashFS in Android 7.0 raised the following performance concerns:

  • Pixel has very fast flash compared to earlier devices but not a huge number of spare CPU cycles, so reading fewer bytes from flash but needing more CPU for I/O was a potential bottleneck.
  • I/O changes that perform well on an artificial benchmark run on an unloaded system sometimes don't work well on real-world use cases under real-world load (such as crypto on Nexus 6).
  • Benchmarking showed 85% regressions in some places.

As SquashFS matures and adds features to reduce CPU impact (such as a whitelist of commonly-accessed files that shouldn't be compressed), we will continue to evaluate it and offer recommendations to device manufacturers.

How did you halve the size of the system partition without SquashFS?

Applications are stored in .apk files, which are actually ZIP archives. Each .apk file has inside it one or more .dex files containing portable Dalvik bytecode. An .odex file (optimized .dex) lives separately from the .apk file and can contain machine code specific to the device. If an .odex file is available, Android can run applications at ahead-of-time compiled speeds without having to wait for the code to be compiled each time the application is launched. An .odex file isn't strictly necessary: Android can actually run the .dex code directly via interpretation or Just-In-Time (JIT) compilation, but an .odex file provides the best combination of launch speed and run-time speed if space is available.

Example: For the installed-files.txt from a Nexus 6P running Android 7.1 with a total system image size of 2628MiB (2755792836 bytes), the breakdown of the largest contributors to overall system image size by file type is as follows:

.odex 1391770312 bytes 50.5%
.apk 846878259 bytes 30.7%
.so (native C/C++ code) 202162479 bytes 7.3%
.oat files/.art images 163892188 bytes 5.9%
Fonts 38952361 bytes 1.4%
icu locale data 27468687 bytes 0.9%

These figures are similar for other devices too, so on Nexus/Pixel devices, .odex files take up approximately half the system partition. This meant we could continue to use ext4 but write the .odex files to the B partition at the factory and then copy them to /data on first boot. The actual storage used with ext4 A/B is identical to SquashFS A/B, because if we had used SquashFS we would have shipped the preopted .odex files on system_a instead of system_b.

Doesn't copying .odex files to /data mean the space saved on /system is lost on /data?

Not exactly. On Pixel, most of the space taken by .odex files is for apps, which typically exist on /data. These apps take Google Play updates, so the .apk and .odex files on the system image are unused for most of the life of the device. Such files can be excluded entirely and replaced by small, profile-driven .odex files when the user actually uses each app (thus requiring no space for apps the user doesn't use). For details, refer to the Google I/O 2016 talk The Evolution of Art.

The comparison is difficult for a few key reasons:

  • Apps updated by Google Play have always had their .odex files on /data as soon as they receive their first update.
  • Apps the user doesn't run don't need an .odex file at all.
  • Profile-driven compilation generates smaller .odex files than ahead-of-time compilation (because the former optimizes only performance-critical code).

For details on the tuning options available to OEMs, see Configuring ART.

Aren't there two copies of the .odex files on /data?

It's a little more complicated ... After the new system image has been written, the new version of dex2oat is run against the new .dex files to generate the new .odex files. This occurs while the old system is still running, so the old and new .odex files are both on /data at the same time.

The code in OtaDexoptService (frameworks/base/+/nougat-mr1-release/services/core/java/com/android/server/pm/OtaDexoptService.java#200) calls getAvailableSpace before optimizing each package to avoid over-filling /data. Note that available here is still conservative: it's the amount of space left before hitting the usual system low space threshold (measured as both a percentage and a byte count). So if /data is full, there won't be two copies of every .odex file. The same code also has a BULK_DELETE_THRESHOLD: If the device gets that close to filling the available space (as just described), the .odex files belonging to apps that aren't used are removed. That's another case without two copies of every .odex file.

In the worst case where /data is completely full, the update waits until the device has rebooted into the new system and no longer needs the old system's .odex files. The PackageManager handles this: (frameworks/base/+/nougat-mr1-release/services/core/java/com/android/server/pm/PackageManagerService.java#7215). After the new system has successfully booted, installd (frameworks/native/+/nougat-mr1-release/cmds/installd/commands.cpp#2192) can remove the .odex files that were used by the old system, returning the device back to the steady state where there's only one copy.

So, while it is possible that /data contains two copies of all the .odex files, (a) this is temporary and (b) only occurs if you had plenty of free space on /data anyway. Except during an update, there's only one copy. And as part of ART's general robustness features, it will never fill /data with .odex files anyway (because that would be a problem on a non-A/B system too).

Doesn't all this writing/copying increase flash wear?

Only a small portion of flash is rewritten: a full Pixel system update writes about 2.3GiB. (Apps are also recompiled, but that's true of non-A/B too.) Traditionally, block-based full OTAs wrote a similar amount of data, so flash wear rates should be similar.

Does flashing two system partitions increase factory flashing time?

No. Pixel didn't increase in system image size (it merely divided the space across two partitions).

Doesn't keeping .odex files on B make rebooting after factory data reset slow?

Yes. If you've actually used a device, taken an OTA, and performed a factory data reset, the first reboot will be slower than it would otherwise be (1m40s vs 40s on a Pixel XL) because the .odex files will have been lost from B after the first OTA and so can't be copied to /data. That's the trade-off.

Factory data reset should be a rare operation when compared to regular boot so the time taken is less important. (This doesn't affect users or reviewers who get their device from the factory, because in that case the B partition is available.) Use of the JIT compiler means we don't need to recompile everything, so it's not as bad as you might think. It's also possible to mark apps as requiring ahead-of-time compilation using coreApp="true" in the manifest: (frameworks/base/+/nougat-mr1-release/packages/SystemUI/AndroidManifest.xml#23). This is currently used by system_server because it's not allowed to JIT for security reasons.

Doesn't keeping .odex files on /data rather than /system make rebooting after an OTA slow?

No. As explained above, the new dex2oat is run while the old system image is still running to generate the files that will be needed by the new system. The update isn't considered available until that work has been done.

Can (should) we ship a 32GiB A/B device? 16GiB? 8GiB?

32GiB works well as it was proven on Pixel, and 320MiB out of 16GiB means a reduction of 2%. Similarly, 320MiB out of 8GiB a reduction of 4%. Obviously A/B would not be the recommended choice on devices with 4GiB, as the 320MiB overhead is almost 10% of the total available space.

Does AVB2.0 require A/B OTAs?

No. Android Verified Boot has always required block-based updates, but not necessarily A/B updates.

Do A/B OTAs require AVB2.0?

No.

Do A/B OTAs break AVB2.0's rollback protection?

No. There's some confusion here because if an A/B system fails to boot into the new system image it will (after some number of retries determined by your bootloader) automatically revert to the "previous" system image. The key point here though is that "previous" in the A/B sense is actually still the "current" system image. As soon as the device successfully boots a new image, rollback protection kicks in and ensures that you can't go back. But until you've actually successfully booted the new image, rollback protection doesn't consider it to be the current system image.

If you're installing an update while the system is running, isn't that slow?

With non-A/B updates, the aim is to install the update as quickly as possible because the user is waiting and unable to use their device while the update is applied. With A/B updates, the opposite is true; because the user is still using their device, as little impact as possible is the goal, so the update is deliberately slow. Via logic in the Java system update client (which for Google is GmsCore, the core package provided by GMS), Android also attempts to choose a time when the users aren't using their devices at all. The platform supports pausing/resuming the update, and the client can use that to pause the update if the user starts to use the device and resume it when the device is idle again.

There are two phases while taking an OTA, shown clearly in the UI as Step 1 of 2 and Step 2 of 2 under the progress bar. Step 1 corresponds with writing the data blocks, while step 2 is pre-compiling the .dex files. These two phases are quite different in terms of performance impact. The first phase is simple I/O. This requires little in the way of resources (RAM, CPU, I/O) because it's just slowly copying blocks around.

The second phase runs dex2oat to precompile the new system image. This obviously has less clear bounds on its requirements because it compiles actual apps. And there's obviously much more work involved in compiling a large and complex app than a small and simple app; whereas in phase 1 there are no disk blocks that are larger or more complex than others.

The process is similar to when Google Play installs an app update in the background before showing the 5 apps updated notification, as has been done for years.

What if a user is actually waiting for the update?

The current implementation in GmsCore doesn't distinguish between background updates and user-initiated updates but may do so in the future. In the case where the user explicitly asked for the update to be installed or is watching the update progress screen, we'll prioritize the update work on the assumption that they're actively waiting for it to finish.

What happens if there's a failure to apply an update?

With non-A/B updates, if an update failed to apply, the user was usually left with an unusable device. The only exception was if the failure occurred before an application had even started (because the package failed to verify, say). With A/B updates, a failure to apply an update does not affect the currently running system. The update can simply be retried later.

What does GmsCore do?

In Google's A/B implementation, the platform APIs and update_engine provide the mechanism while GmsCore provides the policy. That is, the platform knows how to apply an A/B update and all that code is in AOSP (as mentioned above); but it's GmsCore that decides what and when to apply.

If you’re not using GmsCore, you can write your own replacement using the same platform APIs. The platform Java API for controlling update_engine is android.os.UpdateEngine: frameworks/base/core/java/android/os/UpdateEngine.java. Callers can provide an UpdateEngineCallback to be notified of status updates: frameworks/base/+/master/core/java/android/os/UpdateEngineCallback.java. Refer to the reference files for the core classes to use the interface.

Which systems on a chip (SoCs) support A/B?

As of 2017-03-15, we have the following information:

Android 7.x Release Android 8.x Release
Qualcomm Depending on OEM requests All chipsets will get support
Mediatek Depending on OEM requests All chipsets will get support

For details on schedules, check with your SoC contacts. For SoCs not listed above, reach out to your SoC directly.