Android 8.0 ART Improvements

The Android runtime (ART) has been improved significantly in the Android 8.0 release. The list below summarizes enhancements device manufacturers can expect in ART.

Loop optimizations

A wide variety of loop optimizations are employed by ART in the Android 8.0 release:

  • Bounds check eliminations
    • Static: ranges are proven to be within bounds at compile-time
    • Dynamic: run-time tests ensure loops stay within bounds (deopt otherwise)
  • Induction variable eliminations
    • Remove dead induction
    • Replace induction that is used only after the loop by closed-form expressions
  • Dead code elimination inside the loop-body, removal of whole loops that become dead
  • Strength reduction
  • Loop transformations: reversal, interchanging, splitting, unrolling, unimodular, etc.
  • SIMDization (also called vectorization)

The loop optimizer resides in its own optimization pass in the ART compiler. Most loop optimizations are similar to optimizations and simplification elsewhere. Challenges arise with some optimizations that rewrite the CFG in a more than usual elaborate way, because most CFG utilities (see nodes.h) focus on building a CFG, not rewriting one.

Class hierarchy analysis

ART in Android 8.0 uses Class Hierarchy Analysis (CHA), a compiler optimization that devirtualizes virtual calls into direct calls based on the information generated by analyzing class hierarchies. Virtual calls are expensive since they are implemented around a vtable lookup, and they take a couple of dependent loads. Also virtual calls cannot be inlined.

Here is a summary of related enhancements:

  • Dynamic single-implementation method status updating - At the end of class linking time, when vtable has been populated, ART conducts an entry-by-entry comparison to the vtable of the super class.
  • Compiler optimization - The compiler will take advantage of the single-implementation info of a method. If a method A.foo has single-implementation flag set, compiler will devirtualize the virtual call into a direct call, and further try to inline the direct call as a result.
  • Compiled code invalidation - Also at the end of class linking time when single-implementation info is updated, if method A.foo that previously had single-implementation but that status is now invalidated, all compiled code that depends on the assumption that method A.foo has single-implementation needs to have their compiled code invalidated.
  • Deoptimization - For live compiled code that's on stack, deoptimization will be initiated to force the invalidated compiled code into interpreter mode to guarantee correctness. A new mechanism of deoptimization which is a hybrid of synchronous and asynchronous deoptimization will be used.

Inline caches in .oat files

ART now employs inline caches and optimizes the call sites for which enough data exists. The inline caches feature records additional runtime information into profiles and uses it to add dynamic optimizations to ahead of time compilation.

Dexlayout

Dexlayout is a library introduced in Android 8.0 to analyze dex files and reorder them according to a profile. Dexlayout aims to use runtime profiling information to reorder sections of the dex file during idle maintenance compilation on device. By grouping together parts of the dex file that are often accessed together, programs can have better memory access patterns from improved locality, saving RAM and shortening start up time.

Since profile information is currently available only after apps have been run, dexlayout is integrated in dex2oat's on-device compilation during idle maintenance.

Dex cache removal

Up to Android 7.0, the DexCache object owned four large arrays, proportional to the number of certain elements in the DexFile, namely:

  • strings (one reference per DexFile::StringId),
  • types (one reference per DexFile::TypeId),
  • methods (one native pointer per DexFile::MethodId),
  • fields (one native pointer per DexFile::FieldId).

These arrays were used for fast retrieval of objects that we previously resolved. In Android 8.0, all arrays have been removed except the methods array.

Interpreter performance

Interpreter performance significantly improved in the Android 7.0 release with the introduction of "mterp" - an interpreter featuring a core fetch/decode/interpret mechanism written in assembly language. Mterp is modelled after the fast Dalvik interpreter, and supports arm, arm64, x86, x86_64, mips and mips64. For computational code, Art's mterp is roughly comparable to Dalvik's fast interpreter. However, in some situations it can be significantly - and even dramatically - slower:

  1. Invoke performance.
  2. String manipulation, and other heavy users of methods recognized as intrinsics in Dalvik.
  3. Higher stack memory usage.

Android 8.0 addresses these issues.

More inlining

Since Android 6.0, ART can inline any call within the same dex files, but could only inline leaf methods from different dex files. There were two reasons for this limitation:

  1. Inlining from another dex file requires to use the dex cache of that other dex file, unlike same dex file inlining, which could just re-use the dex cache of the caller. The dex cache is needed in compiled code for a couple of instructions like static calls, string load, or class load.
  2. The stack maps are only encoding a method index within the current dex file.

To address these limitations, Android 8.0:

  1. Removes dex cache access from compiled code (also see section "Dex cache removal")
  2. Extends stack map encoding.

Synchronization improvements

The ART team tuned the MonitorEnter/MonitorExit code paths, and reduced our reliance on traditional memory barriers on ARMv8, replacing them with newer (acquire/release) instructions where possible.