A/B (Seamless) System Updates

A/B system updates, also known as seamless updates, ensure a workable booting system remains on the disk during an over-the-air (OTA) update. This reduces the likelihood of an inactive device afterward, which means fewer device replacements and device reflashes at repair and warranty centers. This approach is already explored successfully by other commercial-grade operating systems, such as ChromeOS, and Android 8.0 comes with the necessary platform changes to conduct streaming updates.

Note: Android 7.1, in which A/B updates were introduced, requires the following patches to be cherrypicked before streaming updates can be enabled. This is true whether using Google Mobile Services (GMS) or any other update client.

Users don't always have enough space on /data to download the update package, and neither OEMs nor users want to waste space on a /cache partition; so some users go without updates because they have nowhere to store the update package. A/B updates have the option of streaming the update to address this issue: streaming writes blocks straight to the B partition as they are downloaded, without having to store them on /data. Therefore, streaming A/B updates need almost no temporary storage and need just enough for roughly 100 KiB of metadata.

Customers can continue to use their devices during an OTA. The only downtime during an update is when the device reboots into the updated disk partition. If the OTA fails, the device is still useable since it will boot into the pre-OTA disk partition. The download of the OTA can be attempted again. A/B system updates implemented through OTA are recommended for new devices only.

A/B system updates affect:

  • Interactions with the bootloader
  • Partition selection
  • The build process
  • OTA update package generation

The existing dm-verity feature guarantees the device will boot an uncorrupted image. If a device doesn't boot, because of a bad OTA or dm-verity issue, the device can reboot into an old image.

Note: Android Verified Boot does not require A/B updates.

Overview

The A/B system is robust because any errors (such as I/O errors) affect only the unused partition set and can be retried. Such errors also become less likely because the I/O load is deliberately low to avoid degrading the user experience.

OTA updates can occur while the system is running, without interrupting the user. This includes the app optimizations that occur after a reboot. Additionally, the cache partition is no longer used to store OTA update packages; there is no need for sizing the cache partition.

A/B system updates use a background daemon called update_engine and two sets of partitions. The two sets of partitions are referred to as slots, normally as slot A and slot B. The system runs from one slot, the current slot, while the partitions in the unused slot are not accessed by the running system (for normal operation).

The goal of this feature is to make updates fault resistant by keeping the unused slot as a fallback. If there is an error during an update or immediately after an update, the system can rollback to the old slot and continue to have a working system. To achieve this goal, none of the partitions used by the current slot should be updated as part of the OTA update (including partitions for which there is only one copy).

Each slot has a bootable attribute, which states whether the slot contains a correct system from which the device can boot. The current slot is clearly bootable when the system is running, but the other slot may have an old (still correct) version of the system, a newer version, or invalid data. Regardless of what the current slot is, there is one slot which is the active or preferred slot. The active slot is the one the bootloader will boot from on the next boot. Finally, each slot has a successful attribute set by the user space, which is only relevant if the slot is also bootable.

A successful slot should be able to boot, run, and update itself. A bootable slot that was not marked as successful (after several attempts were made to boot from it) should be marked as unbootable by the bootloader, including changing the active slot to another bootable slot (normally to the slot running right before the attempt to boot into the new, active one). The specific details of the interface are defined in boot_control.h.

Bootloader state examples

The boot_control HAL is used by update_engine (and possibly other daemons) to instruct the bootloader what to boot from. These are common example scenarios and their associated states:

  • Normal case: The system is running from its current slot, either slot A or B. No updates have been applied so far. The system's current slot is bootable, successful, and the active slot.
  • Update in progress: The system is running from slot B, so slot B is the bootable, successful, and active slot. Slot A was marked as unbootable since the contents of slot A are being updated but not yet completed. A reboot in this state should continue booting from slot B.
  • Update applied, reboot pending: The system is running from slot B, slot B is bootable and successful, but slot A was marked as active (and therefore is marked as bootable). Slot A is not yet marked as successful and some number of attempts to boot from slot A should be made by the bootloader.
  • System rebooted into new update: The system is running from slot A for the first time, slot B is still bootable and successful while slot A is only bootable, and still active but not successful. A user space daemon should mark slot A as successful after some checks are made.

Update Engine features

The update_engine daemon runs in the background and prepares the system to boot into a new, updated version. The update_engine daemon is not involved in the boot process itself and is limited in what it can do during an update. The update_engine daemon can do the following:

  • Read from the current slot A/B partitions and write any data to the unused slot A/B partitions as instructed by the OTA package
  • Call the boot_control interface in a pre-defined workflow
  • Run a post-install program from the new partition after writing all the unused slot partitions, as instructed by the OTA package

The post-install step is described in detail below. Note that the update_engine daemon is limited by the SELinux policies and features in the current slot; those policies and features can't be updated until the system boots into a new version. To achieve a robustness goal, the update process should not:

  • Modify the partition table
  • Modify the contents of partitions in the current slot
  • Modify the contents of non-A/B partitions that can't be wiped with a factory reset

Update Engine source

The source to update_engine is in system/update_engine. The A/B OTA dexopt files are split between installd and package manager:

A working example can be found in /device/google/marlin/device-common.mk.

Life of an A/B update

The update process starts when an OTA package, referred to in code as a payload, is available for downloading. Policies in the device may defer the payload download and application based on battery level, user activity, whether it is connected to a charger, or other policies. But since the update runs in the background, the user might not know that an update is in progress and the process can be interrupted at any point due to policies or unexpected reboots.

Optionally, metadata in the OTA package itself indicates the update can be streamed. The same package can be used for non-streaming installation, as well. The server may use the metadata to tell the client it's streaming so the client will hand off the OTA to update_engine correctly. To enable streaming updates, manufacturers with their own server and client would need to:

  1. on the server, identify the update is streaming (or just assume all are)
  2. on the client, make the correct call to update_engine for streaming

Device manufacturers should use the fact that the package is of the streaming variant to send a flag to the client to trigger hand off to the framework side as streaming.

The steps in the update process after a payload is available are as follows:

Step 1: The current slot (or "source slot") is marked as successful (if not already marked) with markBootSuccessful().

Step 2: The unused slot (or "target slot") is marked as unbootable by calling the function setSlotAsUnbootable().

The current slot is always marked as successful at the beginning of the update to prevent the bootloader from falling back to the unused slot, which will soon have invalid data. If the system has reached the point where it can start applying an update, the current slot is marked as successful even if other major components are broken (such as the UI in a crash loop) since it's possible to push new software to fix these major problems.

The update payload is an opaque blob with the instructions to update to the new version. The update payload consists of basically two parts: the metadata and the extra data associated with the instructions. The metadata is relatively small and contains a list of operations to produce and verify the new version on the target slot. For example, an operation could decompress a certain blob and write it to certain blocks in a target partition, or read from a source partition, apply a binary patch, and write to certain blocks in a target partition. The extra data associated to the operations, not included in the metadata, is the bulk of the update payload and would consist of the compressed blob or binary patch in these examples.

Step 3: The payload metadata is downloaded.

Step 4: For each operation defined in the metadata, in order, the associated data (if any) is downloaded to memory, the operation is applied, and the associated memory is discarded.

These two steps take most of the update time, as they involve writing and downloading large amounts of data, and are likely to be interrupted for reasons of policy or reboot.

Step 5: The whole partitions are re-read and verified against the expected hash.

Step 6: The post-install step (if any) is run.

In the case of an error during the execution of any step, the update fails and is re-attempted with possibly a different payload. If all the steps so far have succeeded, the update succeeds and the last step is executed.

Step 7: The unused slot is marked as active by calling setActiveBootSlot().

Marking the unused slot as active doesn't mean it will finish booting. The bootloader—or system itself—can switch the active slot back if it doesn't read a successful state.

Post-install step

The post-install step consists of running a program from the "new update" version while still running in the old version. If defined in the OTA package, this step is mandatory and the program must return with exit code 0; otherwise, the update fails.

For every partition where a post-install step is defined, update_engine mounts the new partition into a specific location and executes the program specified in the OTA relative to the mounted partition. For example, if the post-install program is defined as usr/bin/postinstall in the system partition, this partition from the unused slot will be mounted in a fixed location (for example, in /postinstall_mount) and the /postinstall_mount/usr/bin/postinstall command will be executed. Note that for this step to work, the following are required:

  • The old kernel needs to be able to mount the new filesystem format. The filesystem type cannot change unless there's support for it in the old kernel (which includes details such as the compression algorithm used if using a compressed filesystem like SquashFS).
  • The old kernel needs to understand the new partition's post-install program format. If using an ELF binary, it should be compatible with the old kernel (e.g. a 64-bit new program running on an old 32-bit kernel if the architecture switched from 32- to 64-bit builds). Also, the libraries will be loaded from the old system image, not the new one, unless the loader (ld) is instructed to use other paths or build a static binary.
  • The new post-install program will be limited by the SELinux policies defined in the old system.

An example case is to use a shell script as a post-install program (interpreted by the old system's shell binary with a #! marker at the top) and then set up library paths from the new environment for executing a more complex binary post-install program.

Another example case is to run the post-install step from a dedicated smaller partition, so the filesystem format in the main system partition can be updated without incurring backward compatibility issues or stepping-stone updates, allowing users to update straight to the latest version from a factory image.

Due to the SELinux policies, the post-install step is suitable for performing tasks required by design on a given device or other best-effort tasks: update the A/B-capable firmware or bootloader, prepare copies of some databases for the new version, etc. This step is not suitable for one-off bug fixes before reboot that require unforeseen permissions.

The selected post-install program runs in the postinstall SELinux context. All the files in the new mounted partition will be tagged with postinstall_file, regardless of what their attributes are after rebooting into that new system. Changes to the SELinux attributes in the new system won't impact the post-install step. If the post-install program needs extra permissions, those must be added to the post-install context.

Implementation

OEMs and SoC vendors who wish to implement the feature must add the following support to their bootloaders:

Figure 1. Bootloader state machine

The boot control HAL can be tested using the system/extras/bootctl utility.

Some tests have been implemented for Brillo:

Kernel patches

Kernel command line arguments

The kernel command line arguments must contain the following extra arguments:

skip_initramfs rootwait ro init=/init root="/dev/dm-0 dm=system none ro,0 1 \
  android-verity <public-key-id> <path-to-system-partition>"

The <public-key-id> value is the ID of the public key used to verify the verity table signature (see dm-verity).

To add the .X509 certificate containing the public key to the system keyring:

  1. Copy the .X509 certificate formatted in the .der format to the root of the kernel directory. Use the following openssl command to convert from .pem to .der format (if the .X509 certificate is formatted in .pem format):
    openssl x509 -in <x509-pem-certificate> -outform der -out <x509-der-certificate>
    
  2. Once copied to the kernel build root, build the zImage to include the certificate as part of the system keyring. This can be verified from the following procfs entry (requires KEYS_CONFIG_DEBUG_PROC_KEYS to be enabled):
    angler:/# cat /proc/keys
    
    1c8a217e I------     1 perm 1f010000     0     0 asymmetri
    Android: 7e4333f9bba00adfe0ede979e28ed1920492b40f: X509.RSA 0492b40f []
    2d454e3e I------     1 perm 1f030000     0     0 keyring
    .system_keyring: 1/4
    

Successful inclusion of the .X509 certificate indicates the presence of the public key in the system keyring. The highlighted portion denotes the public key ID.

As the next step, replace the space with ‘#’ and pass it as <public-key-id> in the kernel command line. For example, in the above case, the following is passed in the place of <public-key-id>: Android:#7e4333f9bba00adfe0ede979e28ed1920492b40f

Recovery

The recovery RAM disk is now contained in the boot.img file. When going into recovery, the bootloader cannot put the skip_initramfs option on the kernel command line.

For non-A/B updates, the recovery partition contains the code used to apply updates. A/B updates are applied by update_engine running in the regular booted system image. There is still a recovery mode used to implement factory data reset and sideloading of update packages, which is where the name "recovery" came from. The code and data for recovery mode is stored in the regular boot partition now, in a ramdisk. So to boot into the system image, the bootloader tells the kernel to skip the ramdisk; otherwise we'll boot into recovery mode. Recovery mode is small (and much of it was already on the boot partition), so the boot partition doesn't increase in size.

Build variables

To implement A/B updates, you need a new A/B-capable bootloader and have `AB_OTA_UPDATER := true` in your board configuration and list the partitions to which A/B applies.

Must define for the A/B target:
  • AB_OTA_UPDATER := true
  • AB_OTA_PARTITIONS := \
      boot \
      system \
      vendor
    and other partitions updated through update_engine (radio, bootloader, etc.)
  • BOARD_BUILD_SYSTEM_ROOT_IMAGE := true
  • TARGET_NO_RECOVERY := true
  • BOARD_USES_RECOVERY_AS_BOOT := true
  • PRODUCT_PACKAGES += \
      update_engine \
      update_verifier

For an example, see:
/device/google/marlin/+/android-7.1.0_r1/device-common.mk

Optionally, conduct the post-install (but pre-reboot) dex2oat step described within the Compilation section.

Optionally define for debug builds:
  • PRODUCT_PACKAGES_DEBUG += update_engine_client
Cannot define for the A/B target:
  • BOARD_RECOVERYIMAGE_PARTITION_SIZE
  • BOARD_CACHEIMAGE_PARTITION_SIZE
  • BOARD_CACHEIMAGE_FILE_SYSTEM_TYPE

Partitions

A/B devices do not need a recovery partition or cache partition because Android no longer uses these partitions. The data partition is now used for the downloaded OTA package, and the recovery image code is on the boot partition. All partitions that are A/B-ed should be named as follows (slots are always named a, b, etc.): boot_a, boot_b, system_a, system_b, vendor_a, vendor_b.

For non-A/B updates, the cache partition was used to store downloaded OTA packages and to stash blocks temporarily while applying updates. There was never a good way to size the cache partition: how large it needed to be depended on what updates you wanted to apply. The worst case would be a cache partition as large as the system image. With A/B updates there's no need to stash blocks (because you're always writing to a partition that isn't currently used) and with streaming A/B there's no need to download the whole OTA package before applying it.

Fstab

The slotselect argument must be on the line for the A/B-ed partitions. For example:

<path-to-block-device>/vendor  /vendor  ext4  ro
wait,verify=<path-to-block-device>/metadata,slotselect

Please note that there should be no partition named vendor but instead the partition vendor_a or vendor_b will be selected and mounted on the /vendor mount point.

Kernel slot arguments

The current slot suffix should be passed either through a specific DT node (/firmware/android/slot_suffix) or through the androidboot.slot_suffix command line argument.

By default, fastboot will flash just slot 'a' on an A/B device, and set the current slot to 'a'. An update package can contain images for slot 'b' too, in which case they will also be flashed. A new '--slot' option lets you ask fastboot to use slot 'b' instead of slot 'a', and the '--set-active' option lets you set that slot as active too. There's also a new 'fastboot set_active' command. See 'fastboot --help' for more details.

Optionally, if the bootloader implements fastboot, the following commands and variables should be supported:

Commands

  • set_active <slot> —Sets the current active slot to the given slot. This must also clear the unbootable flag for that slot, and reset the retry count to default values.

Variables

  • has-slot:<partition-base-name-without-suffix> —Returns “yes” if the given partition supports slots, “no” otherwise.
  • current-slot —Returns the slot suffix that will be booted from next.
  • slot-count —Returns an integer representing the number of available slots. Currently, two slots are supported so this value is 2.
  • slot-successful:<slot-suffix> —Returns "yes" if the given slot has been marked as successfully booting, "no" otherwise.
  • slot-unbootable:<slot-suffix> —Returns “yes” if the given slot is marked as unbootable, "no" otherwise.
  • slot-retry-count: —Number of retries remaining to attempt to boot the given slot.
  • These variables should all appear under the following: fastboot getvar all

OTA package generation

The OTA package tools follow the same commands as the commands for non-A/B devices. The target_files.zip file must be generated by defining the build variables for the A/B target. The OTA package tools automatically identify and generate packages in the format for the A/B updater.

For example, use the following to generate a full OTA:

./build/tools/releasetools/ota_from_target_files \
  dist_output/tardis-target_files.zip ota_update.zip

Or, generate an incremental OTA:

./build/tools/releasetools/ota_from_target_files \
  -i PREVIOUS-tardis-target_files.zip \
  dist_output/tardis-target_files.zip incremental_ota_update.zip

Configuration

Partitions

The Update Engine can update any pair of A/B partitions defined in the same disk.

A pair of partitions has a common prefix (such as system or boot) and per-slot suffix (such as _a). The list of partitions for which the payload generator defines an update is configured by the AB_OTA_PARTITIONS make variable. For example, if a pair of partitions bootloader_a and booloader_b are included (_a and _b are the slot suffixes), these partitions can be updated by specifying the following on the product or board configuration:

AB_OTA_PARTITIONS := \
  boot \
  system \
  bootloader

All the partitions updated by the Update Engine must not be modified by the rest of the system. During incremental or delta updates, the binary data from the current slot is used to generate the data in the new slot. Any modification may cause the new slot data to fail verification during the update process, and therefore fail the update.

Post-install

The post-install step can be configured differently for each updated partition using a set of key-value pairs.

To run a program located at /system/usr/bin/postinst in a new image, specify the path relative to the root of the filesystem in the system partition. For example, usr/bin/postinst is system/usr/bin/postinst (if not using a RAM disk). Additionally, specify the filesystem type to pass to the mount(2) system call. Add the following to the product or device .mk files (if applicable):

AB_OTA_POSTINSTALL_CONFIG += \
  RUN_POSTINSTALL_system=true \
  POSTINSTALL_PATH_system=usr/bin/postinst \
  FILESYSTEM_TYPE_system=ext4

Compilation

Minimally, you must compile ahead of time odex files for system_server and its dependencies (because system_server isn't allowed to JIT for security reasons); but anything else is optional.

Compiling apps in the background for A/B updates requires the following two additions to the product's device configuration (in the product's device.mk):

  1. Include the native components in the build. This ensures the compilation script and binaries are compiled and included in the system image.
      # A/B OTA dexopt package
      PRODUCT_PACKAGES += otapreopt_script
    
  2. Connect the compilation script to update_engine such that it is run as a post-install step.
      # A/B OTA dexopt update_engine hookup
      AB_OTA_POSTINSTALL_CONFIG += \
        RUN_POSTINSTALL_system=true \
        POSTINSTALL_PATH_system=system/bin/otapreopt_script \
        FILESYSTEM_TYPE_system=ext4 \
        POSTINSTALL_OPTIONAL_system=true
      

See First boot installation of DEX_PREOPT files to install the preopted files in the unused second system partition.

Frequently asked questions

Has Google used A/B OTAs on any devices?

Yes. The marketing name for this feature is seamless updates. The Pixel and Pixel XL phones from October 2016 shipped with A/B. Additionally, all Chromebooks use the same update_engine implementation of A/B. The necessary platform code implementation is public in Android 7.1 and later.

Why are A/B OTAs better?

As the name of seamless updates implies, A/B OTAs provide a better user experience when taking updates. Measurements from monthly security updates show this feature has already proven a success: As of May 2017, 95% of Pixel owners are running the latest security update after a month compared to 87% of Nexus users, and the Pixel users update sooner than Nexus users would. Failures to update blocks during an OTA no longer result in a device that won't boot; until the new system image has successfully booted, Android retains the ability to fall back to the previous working system image.

How did A/B affect the 2016 Pixel partition sizes?

See the following table for the shipping A/B configuration versus the internally-tested non-A/B configuration:

Pixel partition sizes A/B Non-A/B
Bootloader 50*2 50
Boot 32*2 32
Recovery 0 32
Cache 0 100
Radio 70*2 70
Vendor 300*2 300
System 2048*2 4096
Total 5000 4680

Therefore, A/B updates require an increase of only 320 MiB in flash. Savings of 32MiB come from removing the recovery partition, while another 100MiB is preserved by removing the cache partition.

This roughly balanced out the cost of the B partitions for the bootloader, the boot partition, and the radio partition. The vendor partition doubled in size. (This was the vast majority of the size increase.) Pixel's A/B system image is half the size of the original non-A/B system image.

So for Pixel, the A/B and non-A/B variants tested internally (only A/B shipped), the space used differed by only 320MiB. On a 32GiB device, this is just under 1%. For a 16GiB device this would be less than 2%, and for an 8GiB device almost 4% (assuming all three devices had the same system image).

Why didn't you use SquashFS?

Android did experiment with SquashFS but wasn't able to achieve the performance desired for a high-end device. Android doesn't use or recommend SquashFS for handheld devices.

SquashFS provided about 50% size savings on the system partition, but the overwhelming majority of the files that compressed well were the precompiled .odex files. Those files had very high compression ratios (approaching 80%), but the compression ratio for the rest of the system partition was much lower.

And there were serious concerns about performance with SquashFS in N:

  • Pixel has very fast flash compared to earlier devices but not a huge number of spare CPU cycles, so reading fewer bytes from flash but needing more CPU for I/O was a potential bottleneck.
  • I/O changes that perform well on an artificial benchmark run on an unloaded system sometimes don't work well on real-world use cases under real-world load (such as crypto on Nexus 6).
  • Benchmarking showed 85% regressions in some places. As SquashFS matures and adds features to reduce CPU impact (such as a whitelist of commonly-accessed files that shouldn't be compressed), the Android team will continue to evaluate SquashFS and then offer recommendations to device manufacturers.

How did you halve the size of the system partition without SquashFS?

Applications are stored in .apk files, which are actually ZIP archives. Each .apk file has inside it one or more .dex files containing portable Dalvik bytecode. An .odex file (optimized .dex) lives separately from the apk file and can contain machine code specific to the device. If an odex file is available, Android can run applications at ahead-of-time compiled speeds without having to wait for the code to be compiled each time the application is launched. An odex file isn't strictly necessary: Android can actually run the .dex code directly via interpretation or Just-In-Time (JIT) compilation, but an odex file provides the best combination of launch speed and run-time speed if space is available.

If you look at the installed-files.txt from a Nexus 6P N MR1 build, where the total system image size is 2628MiB (2755792836 bytes), the breakdown of the largest contributors to overall system image size by file type looks like this:

.odex 1391770312 bytes 50.5%
.apk 846878259 bytes 30.7%
.so (native C/C++ code) 202162479 bytes 7.3%
.oat files/.art images 163892188 bytes 5.9%
Fonts 38952361 bytes 1.4%
icu locale data 27468687 bytes 0.9%

These figures are similar for other devices too, so on Nexus/Pixel devices, odex files take up roughly half of the system partition. This meant that we could continue to use ext4 but write the odex files to the B partition at the factory and then copy them to /data on first boot. The actual storage used on Marlin/Sailfish with ext4 A/B is identical to SquashFS A/B, because if we'd used SquashFS we would have shipped the preopted odex files on system_a instead of system_b.

Doesn't copying odex files to /data mean that the space saved on system is lost on data?

Not exactly. On Pixel, most of the space taken by odex files are for apps. These typically exist on /data anyway. Since apps take Google Play updates, the apk and odex files on the system image are unused for most of the life of the device. So these files can be excluded entirely and replaced by small profile-driven odex files when the user actually uses each app and requiring no space for apps the user doesn't use. (This was discussed at Google I/O 2016 in The Evolution of Art.

These are the key reasons why comparison is difficult:

  • Apps updated by Google Play have always had their odex files on /data as soon as they receive their first update.
  • Apps that the user doesn't run don't need an odex file at all.
  • Profile-driven compilation generates smaller odex files than ahead-of-time compilation (because the former optimizes only performance-critical code).

The Configuring ART documentation explains the tuning options available to the OEM.

Aren't there two copies of the odex files on /data?

It's a little more complicated than that...
After the new system image has been written, the new version of dex2oat is run against the new dex files to generate the new odex files. This happens while the old system is still running, and so the old and new odex files are both on /data at the same time.

The code in OtaDexoptService (frameworks/base/+/nougat-mr1-release/services/core/java/com/android/server/pm/OtaDexoptService.java#200) calls getAvailableSpace before optimizing each package to avoid over-filling /data. Note that available here is still conservative: it's the amount of space left before hitting the usual system low space threshold (measured as both a percentage and a byte count). So if /data is full, there won't be two copies of every odex file.
The same code also has a BULK_DELETE_THRESHOLD: if the device gets that close to filling the available space (as just described), the odex files belonging to apps that aren't used are removed. That's another case without two copies of every odex file.

In the worst case where /data is completely full, the update waits until the device has rebooted into the new system and no longer need the old system's odex files.

The PackageManager handles this: (frameworks/base/+/nougat-mr1-release/services/core/java/com/android/server/pm/PackageManagerService.java#7215).

Once the new system has successfully booted, installd (frameworks/native/+/nougat-mr1-release/cmds/installd/commands.cpp#2192) can remove the odex files that were used by the old system, returning the device back to the steady state where there's only one copy.

So to return to the original question: it is possible that /data contains two copies of all the odex files, but (a) only temporarily and (b) only if you had plenty of free space on /data anyway. Except during an update, there's only one copy. And as part of ART's general robustness features, it will never fill /data with odex files anyway (because that would be a problem on a non-A/B system too).

Doesn't all this writing/copying increase flash wear?

Only a small portion of flash is rewritten: a full Pixel system update writes about 2.3GiB. (Apps are also recompiled, but that's true of non-A/B too.) Traditionally, block-based full OTAs wrote a similar amount of data, so flash wear rates should be similar.

Does flashing two system partitions increase factory flashing time?

Pixel didn't increase in system image size (it merely divided the space across two partitions). So no, factory flashing time did not grow.

Doesn't keeping odex files on B make rebooting after factory data reset slow?

Yes. If you've actually used a device and taken an OTA and then perform a factory data reset, the first reboot will be slower than it would otherwise be (taking 1m40s vs 40s on a Pixel XL just tested) because the odex files will have been lost from B after the first OTA and so can't be copied to /data. That's the trade-off.

Factory data reset should be a rare operation - certainly compared to regular boot - so the time taken is less important. (This doesn't affect users or reviewers who get their device from the factory, because in that case the B partition is available.) Thanks to the JIT compiler, we also don't need to recompile everything, so it's not as bad as you might think. It's also possible to mark apps as requiring ahead-of-time compilation using coreApp="true" in the manifest: (frameworks/base/+/nougat-mr1-release/packages/SystemUI/AndroidManifest.xml#23)

This is currently used by system_server because it's not allowed to JIT for security reasons.

Doesn't keeping odex files on /data rather than /system make rebooting after an OTA slow?

No. As explained above, the new dex2oat is run while the old system image is still running to generate the files that will be needed by the new system. The update isn't considered available until that work has been done.

Can (should) we ship a 32GiB A/B device? 16GiB? 8GiB?

32GiB works well as it was proven on Pixel, and 320MiB out of 16GiB means a reduction of 2%. Similarly, 320MiB out of 8GiB a reduction of 4%. Obviously A/B, would not be the recommended choice on devices with 4GiB, as the 320MiB overhead is almost 10% of the total available space.

Does AVB2.0 require A/B OTAs?

No. Android Verified Boot has always required block-based updates, but not necessarily A/B updates.

Do A/B OTAs require AVB2.0?

No.

Do A/B OTAs break AVB2.0's rollback protection?

No. There's some confusion here because if an A/B system fails to boot into the new system image it will (after some number of retries determined by your bootloader) automatically revert to the "previous" system image. The key point here though is that "previous" in the A/B sense is actually still the "current" system image. As soon as the device successfully boots a new image, rollback protection kicks in and ensures that you can't go back. But until you've actually successfully booted the new image, rollback protection doesn't consider it to be the current system image.

If you're installing an update while the system is running, isn't that slow?

With non-A/B updates, the aim is to install the update as quickly as possible because the user is waiting and unable to use their device while the update is applied. With A/B updates, the opposite is true; because the user is still using their device, as little impact as possible is the goal, so the update is deliberately slow. Android also (via logic in the Java system update client, in Google’s case GmsCore, the core package provided by GMS) try to choose a time when the users aren't using their devices at all. The platform supports pausing/resuming the update, and the client can use that to pause the update if the user starts to use the device and resume it when the device is idle again.

There are two phases while taking an OTA, shown clearly in the UI as Step 1 of 2 and Step 2 of 2 under the progress bar. Step 1 corresponds with writing the data blocks, while step 2 is pre-compiling the .dex files. These two phases are quite different in terms of performance impact. The first phase is simple I/O. This requires little in the way of resources (RAM, CPU, I/O) because it's just slowly copying blocks around.

The second phase runs dex2oat to precompile the new system image. This obviously has less clear bounds on its requirements because it compiles actual apps. And there's obviously much more work involved in compiling a large and complex app than a small and simple app; whereas in phase 1 there are no disk blocks that are larger or more complex than others.

The process is similar to when Google Play installs an app update in the background before showing the 5 apps updated notification, as has been done for years.

What if a user is actually waiting for the update?

The current implementation in GmsCore doesn't distinguish between background updates and user-initiated updates but may do so in the future. In the case where the user explicitly asked for the update to be installed or is watching the update progress screen, we'll prioritize the update work on the assumption that they're actively waiting for it to finish.

What happens if there's a failure to apply an update?

With non-A/B updates, if an update failed to apply, the user was usually left with an unusable device. The only exception was if the failure occurred before an application had even started (because the package failed to verify, say). With A/B updates, a failure to apply an update does not affect the currently running system. The update can simply be retried later.

What does GmsCore do?

In Google's A/B implementation, the platform APIs and update_engine provide the mechanism while GmsCore provides the policy. That is, the platform knows how to apply an A/B update, and all that code is in AOSP (as mentioned above); but it's GmsCore that decides what and when to apply.

If you’re not using GmsCore, you can write your own replacement using the same platform APIs. The platform Java API for controlling update_engine is android.os.UpdateEngine:

frameworks/base/core/java/android/os/UpdateEngine.java

Callers can provide an UpdateEngineCallback to be notified of status updates:

frameworks/base/+/master/core/java/android/os/UpdateEngineCallback.java

See the reference files for the core classes to use the interface.

Which systems on a chip (SoCs) support A/B?

As of 2017-03-15, we have the following information:

N Release OC Release
Qualcomm Depending on OEM requests All chipsets will get support
Mediatek Depending on OEM requests All chipsets will get support

Please check with your SoC contacts for more details on their schedules. If there are SoCs not listed here, please reach out your SoC directly.