Understanding MTE reports

SIGSEGV crashes with code 9 (SEGV_MTESERR) or code 8 (SEGV_MTEAERR) are Memory Tagging faults. Memory Tagging Extension (MTE) is an Armv9 feature supported in Android 12 and later. MTE is a hardware implementation of tagged memory. It provides fine grained memory protection for detection and mitigation of memory safety bugs.

In C/C++, a pointer returned from a call to malloc() or operator new() or similar functions can only be used to access memory within the bounds of that allocation, and only while the allocation is alive (not free-ed or delete-ed). MTE is used in Android to detect violations of this rule, referred to in the crash reports as "Buffer Overflow"/"Buffer Underflow" and "Use After Free" issues.

MTE has two modes: synchronous (or "sync") and asynchronous (or "async"). The former runs more slowly but provides more accurate diagnostics. The latter runs faster, but can only give approximate details. We'll cover both separately, since the diagnostics are slightly different.

Synchronous mode MTE

In MTE's synchronous ("sync") mode, SIGSEGV crashes with code 9 (SEGV_MTESERR).

pid: 13935, tid: 13935, name: sanitizer-statu  >>> sanitizer-status <<<
uid: 0
tagged_addr_ctrl: 000000000007fff3
signal 11 (SIGSEGV), code 9 (SEGV_MTESERR), fault addr 0x800007ae92853a0
Cause: [MTE]: Use After Free, 0 bytes into a 32-byte allocation at 0x7ae92853a0
x0  0000007cd94227cc  x1  0000007cd94227cc  x2  ffffffffffffffd0  x3  0000007fe81919c0
x4  0000007fe8191a10  x5  0000000000000004  x6  0000005400000051  x7  0000008700000021
x8  0800007ae92853a0  x9  0000000000000000  x10 0000007ae9285000  x11 0000000000000030
x12 000000000000000d  x13 0000007cd941c858  x14 0000000000000054  x15 0000000000000000
x16 0000007cd940c0c8  x17 0000007cd93a1030  x18 0000007cdcac6000  x19 0000007fe8191c78
x20 0000005800eee5c4  x21 0000007fe8191c90  x22 0000000000000002  x23 0000000000000000
x24 0000000000000000  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
x28 0000000000000000  x29 0000007fe8191b70
lr  0000005800eee0bc  sp  0000007fe8191b60  pc  0000005800eee0c0  pst 0000000060001000

backtrace:
      #00 pc 00000000000010c0  /system/bin/sanitizer-status (test_crash_malloc_uaf()+40) (BuildId: 953fc93301472d0b72709b2b9a9f6f30)
      #01 pc 00000000000014a4  /system/bin/sanitizer-status (test(void (*)())+132) (BuildId: 953fc93301472d0b72709b2b9a9f6f30)
      #02 pc 00000000000019cc  /system/bin/sanitizer-status (main+1032) (BuildId: 953fc93301472d0b72709b2b9a9f6f30)
      #03 pc 00000000000487d8  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+96) (BuildId: 6ab39e35a2fae7efbe9a04e9bbb14331)

deallocated by thread 13935:
      #00 pc 000000000004643c  /apex/com.android.runtime/lib64/bionic/libc.so (scudo::Allocator<scudo::AndroidConfig, &(scudo_malloc_postinit)>::quarantineOrDeallocateChunk(scudo::Options, void*, scudo::Chunk::UnpackedHeader*, unsigned long)+688) (BuildId: 6ab39e35a2fae7efbe9a04e9bbb14331)
      #01 pc 00000000000421e4  /apex/com.android.runtime/lib64/bionic/libc.so (scudo::Allocator<scudo::AndroidConfig, &(scudo_malloc_postinit)>::deallocate(void*, scudo::Chunk::Origin, unsigned long, unsigned long)+212) (BuildId: 6ab39e35a2fae7efbe9a04e9bbb14331)
      #02 pc 00000000000010b8  /system/bin/sanitizer-status (test_crash_malloc_uaf()+32) (BuildId: 953fc93301472d0b72709b2b9a9f6f30)
      #03 pc 00000000000014a4  /system/bin/sanitizer-status (test(void (*)())+132) (BuildId: 953fc93301472d0b72709b2b9a9f6f30)

allocated by thread 13935:
      #00 pc 0000000000042020  /apex/com.android.runtime/lib64/bionic/libc.so (scudo::Allocator<scudo::AndroidConfig, &(scudo_malloc_postinit)>::allocate(unsigned long, scudo::Chunk::Origin, unsigned long, bool)+1300) (BuildId: 6ab39e35a2fae7efbe9a04e9bbb14331)
      #01 pc 0000000000042394  /apex/com.android.runtime/lib64/bionic/libc.so (scudo_malloc+36) (BuildId: 6ab39e35a2fae7efbe9a04e9bbb14331)
      #02 pc 000000000003cc9c  /apex/com.android.runtime/lib64/bionic/libc.so (malloc+36) (BuildId: 6ab39e35a2fae7efbe9a04e9bbb14331)
      #03 pc 00000000000010ac  /system/bin/sanitizer-status (test_crash_malloc_uaf()+20) (BuildId: 953fc93301472d0b72709b2b9a9f6f30)
      #04 pc 00000000000014a4  /system/bin/sanitizer-status (test(void (*)())+132) (BuildId: 953fc93301472d0b72709b2b9a9f6f30)

All MTE crash reports contain the usual register dump and backtrace for the point where the issue was detected. The "Cause:" line for an error detected by MTE will contain "[MTE]" as in the example above, along with more detail. In this case, the specific kind of error detected was a "Use after free", and the "0 bytes into a 32-byte allocation at 0x7ae92853a0" tells us the size and address of the allocation, and the offset into the allocation that we tried to access.

MTE crash reports also include extra backtraces, not just the one from the point of detection.

"Use After Free" errors add "deallocated by" and "allocated by" sections to the crash dump, showing the stack traces at the time this memory was deallocated (before it was used!), and the time it was previously allocated. These also tell you which thread did the allocating/deallocating. All three of the detecting thread, allocating thread, and deallocating thread are the same in this simple example, but in more complex real-world cases this isn't necessarily true, and knowing that they differ can be an important clue in finding a concurrency-related bug.

"Buffer Overflow" and "Buffer Underflow" errors only provide an additional "allocated by" stack track, since by definition they haven't been deallocated yet (or they'd show up as a "Use After Free"):

Cause: [MTE]: Buffer Overflow, 0 bytes right of a 32-byte allocation at 0x7ae92853a0
[...]
backtrace:
[...]
allocated by thread 13949:

Note the use of the word "right" here: this means we're telling you how many bytes past the end of the allocation the incorrect access was; an underflow would say "left", and be a number of bytes before the start of the allocation.

Multiple potential causes

Sometimes SEGV_MTESERR reports contain the following line:

Note: multiple potential causes for this crash were detected, listing them in decreasing order of likelihood.

This happens when there are several good candidates for the error origin, and we can't tell which is the actual cause. We print up to 3 such candidates in approximate order of likelihood, and leave analysis up to the user.

signal 11 (SIGSEGV), code 9 (SEGV_MTESERR), fault addr 0x400007b43063db5
backtrace:
    [stack...]

Note: multiple potential causes for this crash were detected, listing them in decreasing order of probability.

Cause: [MTE]: Use After Free, 5 bytes into a 10-byte allocation at 0x7b43063db0
deallocated by thread 6663:
    [stack...]
allocated by thread 6663:
    [stack...]

Cause: [MTE]: Use After Free, 5 bytes into a 6-byte allocation at 0x7b43063db0
deallocated by thread 6663:
    [stack...]

allocated by thread 6663:
    [stack...]

In the above example, we've detected two recent allocations at the same memory address that could have been the intended target of the invalid memory access. This can happen when allocations reuse free memory - for example, if you have the sequence such as new, free, new, free, new, free, access. The more recent allocation is printed first.

Detailed cause determination heuristics

The "Cause" of a crash should show the memory allocation that the accessed pointer was originally derived from. Unfortunately, MTE hardware has no way to translate from a pointer with a mismatched tag to an allocation. To explain a SEGV_MTESERR crash, Android analyzes the following data:

  • The fault address (including the pointer tag).
  • A list of recent heap allocations with stack traces and memory tags.
  • Nearby current (live) allocations and their memory tags.

Any recently deallocated memory at the fault address where the memory tag matches the fault address tag is a potential "Use After Free" cause.

Any nearby live memory where the memory tag matches the fault address tag is a potential "Buffer Overflow" (or "Buffer Underflow") cause.

Allocations that are closer to the fault - either in time or in space - are considered more likely than the ones that are far away.

Since deallocated memory is often reused, and the number of different tag values is small (less than 16), it is not uncommon to find several likely candidates, and there is no way to automatically find the true cause. This is the reason why sometimes MTE reports list multiple potential causes.

It is recommended that the app developer looks at potential causes starting with the most likely one. It is often easy to filter out unrelated causes based on the stack trace.

Asynchronous mode MTE

In MTE's asynchronous ("async") mode, SIGSEGV crashes with code 8 (SEGV_MTEAERR).

SEGV_MTEAERR faults do not happen immediately when a program performs an invalid memory access. The issue is detected shortly after the event, and the program is terminated at that point instead. This point is typically the next system call, but it can also be a timer interrupt - in short, any userspace-to-kernel transition.

SEGV_MTEAERR faults do not preserve the memory address (it is always shown as "-------"). The backtrace corresponds to the moment the condition was detected (i.e. at the next system call or other context switch), and not when the invalid access was performed.

This means that the "main" backtrace in an asynchronous MTE crash is usually not relevant. Async mode failures are thus a lot more difficult to debug than sync mode failures. They are best understood as showing the existence of a memory bug in the nearby code in the given thread. Logs at the bottom of the tombstone file may provide a hint of what actually happened. Otherwise, the recommended course of action is to reproduce the error in sync mode and use the better diagnostics that sync mode provides!

Advanced topics

Under the hood, memory tagging works by assigning a random 4-bit (0..15) tag value to every heap allocation. This value is stored in a special metadata region that corresponds to the allocated heap memory. The same value is assigned to the most significant byte of the heap pointer returned from functions such as malloc() or operator new().

When tag checking is enabled in the process, the CPU automatically compares the top byte of the pointer with the memory tag for every memory access. If the tags don't match, the CPU signals an error that leads to a crash.

Because of the limited number of possible tag values, this approach is probabilistic. Any memory location that should not be accessed with a given pointer - such as out of bounds, or after deallocation ("dangling pointer") - is likely to have a different tag value, and cause a crash. There is a ~7% chance of not detecting any single occurrence of a bug. Because the tag values are assigned randomly, there is an independent ~93% chance of detecting the bug next time it happens.

The tag values can be seen in the fault address field as well as in the register dump, as highlighted below. This section can be used to check that the tags are set in a sane way, as well as to see other nearby memory allocations with the same tag value as they can be potential causes of the error beyond the ones listed in the report. We expect this to be mainly useful for the people working on the implementation of MTE itself or other low-level system components, rather than to developers.

signal 11 (SIGSEGV), code 9 (SEGV_MTESERR), fault addr 0x0800007ae92853a0
Cause: [MTE]: Use After Free, 0 bytes into a 32-byte allocation at 0x7ae92853a0
    x0  0000007cd94227cc  x1  0000007cd94227cc  x2  ffffffffffffffd0  x3  0000007fe81919c0
    x4  0000007fe8191a10  x5  0000000000000004  x6  0000005400000051  x7  0000008700000021
    x8  0800007ae92853a0  x9  0000000000000000  x10 0000007ae9285000  x11 0000000000000030
    x12 000000000000000d  x13 0000007cd941c858  x14 0000000000000054  x15 0000000000000000
    x16 0000007cd940c0c8  x17 0000007cd93a1030  x18 0000007cdcac6000  x19 0000007fe8191c78
    x20 0000005800eee5c4  x21 0000007fe8191c90  x22 0000000000000002  x23 0000000000000000
    x24 0000000000000000  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
    x28 0000000000000000  x29 0000007fe8191b70
    lr  0000005800eee0bc  sp  0000007fe8191b60  pc  0000005800eee0c0  pst 0000000060001000

A special "Memory tags" section also appears in the crash report that shows memory tags around the fault address. In the example below, the pointer tag "4" did not match the memory tag "a".

Memory tags around the fault address (0x0400007b43063db5), one tag per 16 bytes:
  0x7b43063500: 0  f  0  2  0  f  0  a  0  7  0  8  0  7  0  e
  0x7b43063600: 0  9  0  8  0  5  0  e  0  f  0  c  0  f  0  4
  0x7b43063700: 0  b  0  c  0  b  0  2  0  1  0  4  0  7  0  8
  0x7b43063800: 0  b  0  c  0  3  0  a  0  3  0  6  0  b  0  a
  0x7b43063900: 0  3  0  4  0  f  0  c  0  3  0  e  0  0  0  c
  0x7b43063a00: 0  3  0  2  0  1  0  8  0  9  0  4  0  3  0  4
  0x7b43063b00: 0  5  0  2  0  5  0  a  0  d  0  6  0  d  0  2
  0x7b43063c00: 0  3  0  e  0  f  0  a  0  0  0  0  0  0  0  4
=>0x7b43063d00: 0  0  0  a  0  0  0  e  0  d  0 [a] 0  f  0  e
  0x7b43063e00: 0  7  0  c  0  9  0  a  0  d  0  2  0  0  0  c
  0x7b43063f00: 0  0  0  6  0  b  0  8  0  3  0  0  0  5  0  e
  0x7b43064000: 0  d  0  2  0  7  0  a  0  7  0  a  0  d  0  8
  0x7b43064100: 0  b  0  2  0  b  0  4  0  1  0  6  0  d  0  4
  0x7b43064200: 0  1  0  6  0  f  0  2  0  f  0  6  0  5  0  c
  0x7b43064300: 0  1  0  4  0  d  0  6  0  f  0  e  0  1  0  8
  0x7b43064400: 0  f  0  4  0  3  0  2  0  1  0  2  0  5  0  6

Sections of a tombstone that show memory contents around all register values also display their tag values.

memory near x10 ([anon:scudo:primary]):
0000007b4304a000 7e82000000008101 000003e9ce8b53a0  .......~.S......
0700007b4304a010 0000200000006001 0000000000000000  .`... ..........
0000007b4304a020 7c03000000010101 000003e97c61071e  .......|..a|....
0200007b4304a030 0c00007b4304a270 0000007ddc4fedf8  p..C{.....O.}...
0000007b4304a040 84e6000000008101 000003e906f7a9da  ................
0300007b4304a050 ffffffff00000042 0000000000000000  B...............
0000007b4304a060 8667000000010101 000003e9ea858f9e  ......g.........
0400007b4304a070 0000000100000001 0000000200000002  ................
0000007b4304a080 f5f8000000010101 000003e98a13108b  ................
0300007b4304a090 0000007dd327c420 0600007b4304a2b0   .'.}......C{...
0000007b4304a0a0 88ca000000010101 000003e93e5e5ac5  .........Z^>....
0a00007b4304a0b0 0000007dcc4bc500 0300007b7304cb10  ..K.}......s{...
0000007b4304a0c0 0f9c000000010101 000003e9e1602280  ........."`.....
0900007b4304a0d0 0000007dd327c780 0700007b7304e2d0  ..'.}......s{...
0000007b4304a0e0 0d1d000000008101 000003e906083603  .........6......
0a00007b4304a0f0 0000007dd327c3b8 0000000000000000  ..'.}...........