This page describes how to debug Android Runtime (ART) garbage collection (GC) correctness and performance issues. It explains how to use GC verification options, identify solutions for GC verification failures, and measure and address GC performance problems.
To work with ART, see the pages in this ART and Dalvik section, and the Dalvik Executable format. For additional help verifying app behavior, see Verifying app behavior on the Android runtime (ART).
ART GC overview
ART has a few different GC plans that consist of running different garbage collectors. Starting with Android 8 (Oreo), the default plan is Concurrent Copying (CC). The other GC plan is Concurrent Mark Sweep (CMS).
Some of the main characterstics of Concurrent Copying GC are:
- CC enables use of a bump-pointer allocator called RegionTLAB. This allocates a thread-local allocation buffer (TLAB) to each app thread, which can then allocate objects out of its TLAB by bumping the "top" pointer, without any synchronization.
- CC performs heap defragmentation by concurrently copying objects without pausing app threads. This is achieved with the help of a read-barrier which intercepts reference reads from the heap, without the need of any intervention from the app developer.
- GC only has one small pause, which is constant in time with regards to the heap size.
- CC extends to be a generational GC in Android 10 and higher. It enables collecting young objects, which often become unreachable fairly quickly, with little effort. This helps by increasing GC throughput and considerably delaying the need to perform a full-heap GC.
The other GC that ART still supports is CMS. This GC also supports compaction, but not concurrently. Compaction is avoided until the app goes into the background, at which time the app threads are paused to perform compaction. Compaction also becomes necessary when an object allocation fails due to fragmentation. In this case the app may potentially become unresponsive for some time.
Since CMS rarely compacts, and thus free objects may not be contiguous, it uses a free-list-based allocator called RosAlloc. It has a higher allocation cost as compared to RegionTLAB. Finally, because of internal fragmentation the memory usage for the Java heap can be higher for CMS than CC.
GC verification and performance options
Change the GC type
OEMs can change the GC type. The change process
involves setting the ART_USE_READ_BARRIER
environment variable at build time.
The default value is true, which enables the CC collector as it uses the read-barrier. For CMS this
variable should be explicitly set to false.
By default, the CC collector runs in generational mode in Android 10 and higher. To
disable generational mode, the -Xgc:nogenerational_cc
command-line argument can be
used. Alternatively, the system property can be set as follows:
adb shell setprop dalvik.vm.gctype nogenerational_cc
Verify the heap
Heap verification is probably the most useful GC option for debugging
GC-related errors or heap corruption. Enabling heap verification causes the GC
to check the correctness of the heap at a few points during the garbage
collection process. Heap verification shares the same options as the ones that
change the GC type. If enabled, heap verification verifies the roots and
ensures that reachable objects reference only other reachable objects. GC
verification is enabled by passing in the following -Xgc
values:
- If enabled,
[no]preverify
performs heap verification before starting the GC. - If enabled,
[no]presweepingverify
performs heap verification before starting the garbage collector sweeping process. - If enabled,
[no]postverify
performs heap verification after the GC finishes sweeping. [no]preverify_rosalloc
,[no]postsweepingverify_rosalloc
, and[no]postverify_rosalloc
are additional GC options that verify only the state of RosAlloc's internal accounting. Therefore, they are only applicable with the CMS collector, which uses RosAlloc allocator. The main things verified are that magic values match expected constants, and free blocks of memory are all registered in thefree_page_runs_
map.
Performance
There are two main tools to measure GC performance, GC timing dumps and Systrace. There is also an advanced version of Systrace, called Perfetto. The visual way to measure GC performance problems is to use Systrace and Perfetto to determine which GCs are causing long pauses or preempting app threads. Although the ART GC has significantly improved over time, bad mutator behavior, such as excessive allocation, can still cause performance problems
Collection strategy
The CC GC collects by either running a young GC or a full-heap GC. Ideally the young GC is run more often. The GC does young CC collections until the throughput (calculated by bytes freed/second of GC duration) of the just finished collection cycle is less than the mean throughput of full-heap CC collections. When this occurs, the full-heap CC is chosen for the next concurrent GC instead of young CC. After the full-heap collection completes, the next GC is switched back to young CC. One key factor that makes this strategy work is that young CC doesn't adjust the heap footprint limit after it completes. This causes young CC to happen more and more often until the throughput is lower than full-heap CC, which ends up growing the heap.
Use SIGQUIT to get GC performance info
To get GC performance timings for apps, send SIGQUIT
to
already running apps or pass in -XX:DumpGCPerformanceOnShutdown
to dalvikvm
when starting a command line program. When an app gets
the ANR request signal (SIGQUIT
), it dumps information related to its locks,
thread stacks, and GC performance.
To get GC timing dumps, use:
adb shell kill -s QUIT PID
This creates a file (with the date and time in the name such as anr_2020-07-13-19-23-39-817)
in /data/anr/
. This
file contains some ANR dumps as well as GC timings. You can locate the
GC timings by searching for Dumping cumulative Gc timings. These timings
show a few things that may be of interest, including the histogram info for
each GC type's phases and pauses. The pauses are usually more important to look
at. For example:
young concurrent copying paused: Sum: 5.491ms 99% C.I. 1.464ms-2.133ms Avg: 1.830ms Max: 2.133ms
This shows that the average pause was 1.83 ms, which should be low enough that it doesn't cause missed frames in most apps and shouldn't be a concern.
Another area of interest is time to suspend, which measures how long it takes a thread to reach a suspend point after the GC requests that it suspends. This time is included in the GC pauses, so it's useful to determine if long pauses are caused by the GC being slow or the thread suspending slowly. Here's an example of a normal time to suspend on a Nexus 5:
suspend all histogram: Sum: 1.513ms 99% C.I. 3us-546.560us Avg: 47.281us Max: 601us
There are other areas of interest, including total time spent and GC throughput. Examples:
Total time spent in GC: 502.251ms Mean GC size throughput: 92MB/s Mean GC object throughput: 1.54702e+06 objects/s
Here's an example of how to dump the GC timings of an already running app:
adb shell kill -s QUIT PID
adb pull /data/anr/anr_2020-07-13-19-23-39-817
At this point the GC timings are inside of anr_2020-07-13-19-23-39-817
. Here's example output
from Google Maps:
Start Dumping histograms for 2195 iterations for concurrent copying MarkingPhase: Sum: 258.127s 99% C.I. 58.854ms-352.575ms Avg: 117.651ms Max: 641.940ms ScanCardsForSpace: Sum: 85.966s 99% C.I. 15.121ms-112.080ms Avg: 39.164ms Max: 662.555ms ScanImmuneSpaces: Sum: 79.066s 99% C.I. 7.614ms-57.658ms Avg: 18.014ms Max: 546.276ms ProcessMarkStack: Sum: 49.308s 99% C.I. 6.439ms-81.640ms Avg: 22.464ms Max: 638.448ms ClearFromSpace: Sum: 35.068s 99% C.I. 6.522ms-40.040ms Avg: 15.976ms Max: 633.665ms SweepSystemWeaks: Sum: 14.209s 99% C.I. 3.224ms-15.210ms Avg: 6.473ms Max: 201.738ms CaptureThreadRootsForMarking: Sum: 11.067s 99% C.I. 0.835ms-13.902ms Avg: 5.044ms Max: 25.565ms VisitConcurrentRoots: Sum: 8.588s 99% C.I. 1.260ms-8.547ms Avg: 1.956ms Max: 231.593ms ProcessReferences: Sum: 7.868s 99% C.I. 0.002ms-8.336ms Avg: 1.792ms Max: 17.376ms EnqueueFinalizerReferences: Sum: 3.976s 99% C.I. 0.691ms-8.005ms Avg: 1.811ms Max: 16.540ms GrayAllDirtyImmuneObjects: Sum: 3.721s 99% C.I. 0.622ms-6.702ms Avg: 1.695ms Max: 14.893ms SweepLargeObjects: Sum: 3.202s 99% C.I. 0.032ms-6.388ms Avg: 1.458ms Max: 549.851ms FlipOtherThreads: Sum: 2.265s 99% C.I. 0.487ms-3.702ms Avg: 1.031ms Max: 6.327ms VisitNonThreadRoots: Sum: 1.883s 99% C.I. 45us-3207.333us Avg: 429.210us Max: 27524us InitializePhase: Sum: 1.624s 99% C.I. 231.171us-2751.250us Avg: 740.220us Max: 6961us ForwardSoftReferences: Sum: 1.071s 99% C.I. 215.113us-2175.625us Avg: 488.362us Max: 7441us ReclaimPhase: Sum: 490.854ms 99% C.I. 32.029us-6373.807us Avg: 223.623us Max: 362851us EmptyRBMarkBitStack: Sum: 479.736ms 99% C.I. 11us-3202.500us Avg: 218.558us Max: 13652us CopyingPhase: Sum: 399.163ms 99% C.I. 24us-4602.500us Avg: 181.851us Max: 22865us ThreadListFlip: Sum: 295.609ms 99% C.I. 15us-2134.999us Avg: 134.673us Max: 13578us ResumeRunnableThreads: Sum: 238.329ms 99% C.I. 5us-2351.250us Avg: 108.578us Max: 10539us ResumeOtherThreads: Sum: 207.915ms 99% C.I. 1.072us-3602.499us Avg: 94.722us Max: 14179us RecordFree: Sum: 188.009ms 99% C.I. 64us-312.812us Avg: 85.653us Max: 2709us MarkZygoteLargeObjects: Sum: 133.301ms 99% C.I. 12us-734.999us Avg: 60.729us Max: 10169us MarkStackAsLive: Sum: 127.554ms 99% C.I. 13us-417.083us Avg: 58.111us Max: 1728us FlipThreadRoots: Sum: 126.119ms 99% C.I. 1.028us-3202.499us Avg: 57.457us Max: 11412us SweepAllocSpace: Sum: 117.761ms 99% C.I. 24us-400.624us Avg: 53.649us Max: 1541us SwapBitmaps: Sum: 56.301ms 99% C.I. 10us-125.312us Avg: 25.649us Max: 1475us (Paused)GrayAllNewlyDirtyImmuneObjects: Sum: 33.047ms 99% C.I. 9us-49.931us Avg: 15.055us Max: 72us (Paused)SetFromSpace: Sum: 11.651ms 99% C.I. 2us-49.772us Avg: 5.307us Max: 71us (Paused)FlipCallback: Sum: 7.693ms 99% C.I. 2us-32us Avg: 3.504us Max: 32us (Paused)ClearCards: Sum: 6.371ms 99% C.I. 250ns-49753ns Avg: 207ns Max: 188000ns Sweep: Sum: 5.793ms 99% C.I. 1us-49.818us Avg: 2.639us Max: 93us UnBindBitmaps: Sum: 5.255ms 99% C.I. 1us-31us Avg: 2.394us Max: 31us Done Dumping histograms concurrent copying paused: Sum: 315.249ms 99% C.I. 49us-1378.125us Avg: 143.621us Max: 7722us concurrent copying freed-bytes: Avg: 34MB Max: 54MB Min: 2062KB Freed-bytes histogram: 0:4,5120:5,10240:19,15360:69,20480:167,25600:364,30720:529,35840:405,40960:284,46080:311,51200:38 concurrent copying total time: 569.947s mean time: 259.657ms concurrent copying freed: 1453160493 objects with total size 74GB concurrent copying throughput: 2.54964e+06/s / 134MB/s per cpu-time: 157655668/s / 150MB/s Average major GC reclaim bytes ratio 0.486928 over 2195 GC cycles Average major GC copied live bytes ratio 0.0894662 over 2199 major GCs Cumulative bytes moved 6586367960 Cumulative objects moved 127490240 Peak regions allocated 376 (94MB) / 2048 (512MB) Start Dumping histograms for 685 iterations for young concurrent copying ScanCardsForSpace: Sum: 26.288s 99% C.I. 8.617ms-77.759ms Avg: 38.377ms Max: 432.991ms ProcessMarkStack: Sum: 21.829s 99% C.I. 2.116ms-71.119ms Avg: 31.868ms Max: 98.679ms ClearFromSpace: Sum: 19.420s 99% C.I. 5.480ms-50.293ms Avg: 28.351ms Max: 507.330ms ScanImmuneSpaces: Sum: 9.968s 99% C.I. 8.155ms-30.639ms Avg: 14.552ms Max: 46.676ms SweepSystemWeaks: Sum: 6.741s 99% C.I. 3.655ms-14.715ms Avg: 9.841ms Max: 22.142ms GrayAllDirtyImmuneObjects: Sum: 4.466s 99% C.I. 0.584ms-14.315ms Avg: 6.519ms Max: 24.355ms FlipOtherThreads: Sum: 3.672s 99% C.I. 0.631ms-16.630ms Avg: 5.361ms Max: 18.513ms ProcessReferences: Sum: 2.806s 99% C.I. 0.001ms-9.459ms Avg: 2.048ms Max: 11.951ms EnqueueFinalizerReferences: Sum: 1.857s 99% C.I. 0.424ms-8.609ms Avg: 2.711ms Max: 24.063ms VisitConcurrentRoots: Sum: 1.094s 99% C.I. 1.306ms-5.357ms Avg: 1.598ms Max: 6.831ms SweepArray: Sum: 711.032ms 99% C.I. 0.022ms-3.502ms Avg: 1.038ms Max: 7.307ms InitializePhase: Sum: 667.346ms 99% C.I. 303us-2643.749us Avg: 974.227us Max: 3199us VisitNonThreadRoots: Sum: 388.145ms 99% C.I. 103.911us-1385.833us Avg: 566.635us Max: 5374us ThreadListFlip: Sum: 202.730ms 99% C.I. 18us-2414.999us Avg: 295.956us Max: 6780us EmptyRBMarkBitStack: Sum: 132.934ms 99% C.I. 8us-1757.499us Avg: 194.064us Max: 8495us ResumeRunnableThreads: Sum: 109.593ms 99% C.I. 6us-4719.999us Avg: 159.989us Max: 11106us ResumeOtherThreads: Sum: 86.733ms 99% C.I. 3us-4114.999us Avg: 126.617us Max: 19332us ForwardSoftReferences: Sum: 69.686ms 99% C.I. 14us-2014.999us Avg: 101.731us Max: 4723us RecordFree: Sum: 58.889ms 99% C.I. 0.500us-185.833us Avg: 42.984us Max: 769us FlipThreadRoots: Sum: 58.540ms 99% C.I. 1.034us-4314.999us Avg: 85.459us Max: 10224us CopyingPhase: Sum: 52.227ms 99% C.I. 26us-728.749us Avg: 76.243us Max: 2060us ReclaimPhase: Sum: 37.207ms 99% C.I. 7us-2322.499us Avg: 54.316us Max: 3826us (Paused)GrayAllNewlyDirtyImmuneObjects: Sum: 23.859ms 99% C.I. 11us-98.917us Avg: 34.830us Max: 128us FreeList: Sum: 20.376ms 99% C.I. 2us-188.875us Avg: 29.573us Max: 998us MarkZygoteLargeObjects: Sum: 18.970ms 99% C.I. 4us-115.749us Avg: 27.693us Max: 122us (Paused)SetFromSpace: Sum: 12.331ms 99% C.I. 3us-94.226us Avg: 18.001us Max: 109us SwapBitmaps: Sum: 11.761ms 99% C.I. 5us-49.968us Avg: 17.169us Max: 67us ResetStack: Sum: 4.317ms 99% C.I. 1us-64.374us Avg: 6.302us Max: 190us UnBindBitmaps: Sum: 3.803ms 99% C.I. 4us-49.822us Avg: 5.551us Max: 70us (Paused)ClearCards: Sum: 3.336ms 99% C.I. 250ns-7000ns Avg: 347ns Max: 7000ns (Paused)FlipCallback: Sum: 3.082ms 99% C.I. 1us-30us Avg: 4.499us Max: 30us Done Dumping histograms young concurrent copying paused: Sum: 229.314ms 99% C.I. 37us-2287.499us Avg: 334.764us Max: 6850us young concurrent copying freed-bytes: Avg: 44MB Max: 50MB Min: 9132KB Freed-bytes histogram: 5120:1,15360:1,20480:6,25600:1,30720:1,35840:9,40960:235,46080:427,51200:4 young concurrent copying total time: 100.823s mean time: 147.187ms young concurrent copying freed: 519927309 objects with total size 30GB young concurrent copying throughput: 5.15683e+06/s / 304MB/s per cpu-time: 333152554/s / 317MB/s Average minor GC reclaim bytes ratio 0.52381 over 685 GC cycles Average minor GC copied live bytes ratio 0.0512109 over 685 minor GCs Cumulative bytes moved 1542000944 Cumulative objects moved 28393168 Peak regions allocated 376 (94MB) / 2048 (512MB) Total time spent in GC: 670.771s Mean GC size throughput: 159MB/s per cpu-time: 177MB/s Mean GC object throughput: 2.94152e+06 objects/s Total number of allocations 1974199562 Total bytes allocated 104GB Total bytes freed 104GB Free memory 10MB Free memory until GC 10MB Free memory until OOME 442MB Total memory 80MB Max memory 512MB Zygote space size 2780KB Total mutator paused time: 544.563ms Total time waiting for GC to complete: 117.494ms Total GC count: 2880 Total GC time: 670.771s Total blocking GC count: 1 Total blocking GC time: 86.373ms Histogram of GC count per 10000 ms: 0:259879,1:2828,2:24,3:1 Histogram of blocking GC count per 10000 ms: 0:262731,1:1 Native bytes total: 30599192 registered: 8947416 Total native bytes at last GC: 30344912
Tools for analyzing GC correctness problems
Various things can cause crashes inside of ART. Crashes that occur reading or writing to object fields may indicate heap corruption. If the GC crashes when it's running, it could also point to heap corruption. The most common cause of heap corruption is incorrect app code. Fortunately, there are tools to debug GC and heap-related crashes, including the heap verification options specified above, and CheckJNI.
CheckJNI
CheckJNI is a mode that adds JNI checks to verify app behavior; these aren't enabled by default for performance reasons. The checks catch a few errors that could cause heap corruption, such as using invalid/stale local and global references. To enable CheckJNI:
adb shell setprop dalvik.vm.checkjni true
CheckJNI's forcecopy mode is useful for detecting writes past the end of array regions. When enabled, forcecopy causes the array access JNI functions to return copies with red zones. A red zone is a region at the end/start of the returned pointer that has a special value, which is verified when the array is released. If the values in the red zone don't match what's expected, a buffer overrun or underrun occurred. This causes CheckJNI to abort. To enable forcecopy mode:
adb shell setprop dalvik.vm.jniopts forcecopy
An example of an error that CheckJNI should catch is writing past the end of
an array obtained from GetPrimitiveArrayCritical
. This operation
can corrupt the Java heap. If the write is
within the CheckJNI red zone area, then CheckJNI catches the issue when the
corresponding ReleasePrimitiveArrayCritical
is called. Otherwise,
the write corrupts some random object in
the Java heap and can cause a future GC crash. If the corrupted memory
is a reference field, then the GC may catch the error and print the error Tried to
mark <ptr> not contained by any spaces.
This error occurs when the GC attempts to mark an object that it can't find a space for. After this check fails, the GC traverses the roots and tries to see if the invalid object is a root. From here, there are two options: The object is a root or a nonroot object.
Invalid root example
In the case where the object is an invalid root, it prints some
useful information:
art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:383] Tried to mark 0x2
not contained by any spaces
art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:384] Attempting see if it's a bad root art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:485] Found invalid root: 0x2 art E 5955 5955 art/runtime/gc/collector/mark_sweep.cc:486] Type=RootJavaFrame thread_id=1 location=Visiting method 'java.lang.Object com.google.gwt.collections.JavaReadableJsArray.get(int)' at dex PC 0x0002 (native PC 0xf19609d9) vreg=1
In this case, vreg=1
inside of
com.google.gwt.collections.JavaReadableJsArray.get
is
supposed to contain a heap reference, but contains an invalid pointer
of address 0x2
. This is an invalid root. To
debug this issue, use oatdump
on the oat file and look
at the method with the invalid root. In this case, the error turned out to
be a compiler bug in the x86 backend. Here's the changelist that fixed it: https://android-review.googlesource.com/#/c/133932/
Corrupted object example
If the object isn't a root, output similar to the following prints:
01-15 12:38:00.196 1217 1238 E art : Attempting see if it's a bad root 01-15 12:38:00.196 1217 1238 F art : art/runtime/gc/collector/mark_sweep.cc:381] Can't mark invalid object
When heap corruption isn't an invalid root, it's hard to debug. This error message indicates that there was at least one object in the heap that was pointing to the invalid object.