In this document
Android uses a wide variety of audio data formats internally, and exposes a subset of these in public APIs, file formats, and the Hardware Abstraction Layer (HAL).
Properties
The audio data formats are classified by their properties:
 Compression
 Uncompressed, lossless compressed, or lossy compressed. PCM is the most common uncompressed audio format. FLAC is a lossless compressed format, while MP3 and AAC are lossy compressed formats.
 Bit depth
 Number of significant bits per audio sample.
 Container size
 Number of bits used to store or transmit a sample. Usually this is the same as the bit depth, but sometimes additional padding bits are allocated for alignment. For example, a 24bit sample could be contained within a 32bit word.
 Alignment
 If the container size is exactly equal to the bit depth, the representation is called packed. Otherwise the representation is unpacked. The significant bits of the sample are typically aligned with either the leftmost (most significant) or rightmost (least significant) bit of the container. It is conventional to use the terms packed and unpacked only when the bit depth is not a power of two.
 Signedness
 Whether samples are signed or unsigned.
 Representation
 Either fixed point or floating point; see below.
Fixed point representation
Fixed point is the most common representation for uncompressed PCM audio data, especially at hardware interfaces.
A fixedpoint number has a fixed (constant) number of digits before and after the radix point. All of our representations use base 2, so we substitute bit for digit, and binary point or simply point for radix point. The bits to the left of the point are the integer part, and the bits to the right of the point are the fractional part.
We speak of integer PCM, because fixedpoint values are usually stored and manipulated as integer values. The interpretation as fixedpoint is implicit.
We use two's complement for all signed fixedpoint representations, so the following holds where all values are in units of one LSB:
largest negative value = largest positive value + 1
Q and U notation
There are various notations for fixedpoint representation in an integer. We use Q notation: Qm.n means m integer bits and n fractional bits. The "Q" counts as one bit, though the value is expressed in two's complement. The total number of bits is m + n + 1.
Um.n is for unsigned numbers: m integer bits and n fractional bits, and the "U" counts as zero bits. The total number of bits is m + n.
The integer part may be used in the final result, or be temporary. In the latter case, the bits that make up the integer part are called guard bits. The guard bits permit an intermediate calculation to overflow, as long as the final value is within range or can be clamped to be within range. Note that fixedpoint guard bits are at the left, while floatingpoint unit guard digits are used to reduce roundoff error and are on the right.
Floating point representation
Floating point is an alternative to fixed point, in which the location of the point can vary. The primary advantages of floatingpoint include:
 Greater headroom and dynamic range; floatingpoint arithmetic tolerates exceeeding nominal ranges during intermediate computation, and only clamps values at the end
 Support for special values such as infinities and NaN
 Easier to use in many cases
Historically, floatingpoint arithmetic was slower than integer or fixedpoint arithmetic, but now it is common for floatingpoint to be faster, provided control flow decisions aren't based on the value of a computation.
Android formats for audio
The major Android formats for audio are listed in the table below:
Property  Q0.15  Q0.7 ^{1}  Q0.23  Q0.31  float 

Container bits 
16  8  24 or 32 ^{2}  32  32 
Significant bits including sign 
16  8  24  24 or 32 ^{2}  25 ^{3} 
Headroom in dB 
0  0  0  0  126 ^{4} 
Dynamic range in dB 
90  42  138  138 to 186  900 ^{5} 
All fixedpoint formats above have a nominal range of 1.0 to +1.0 minus one LSB. There is one more negative value than positive value due to the two's complement representation.
Footnotes:

All formats above express signed sample values.
The 8bit format is commonly called "unsigned", but
it is actually a signed value with bias of
0.10000000
.  Q0.23 may be packed into 24 bits (three 8bit bytes), or unpacked in 32 bits. If unpacked, the significant bits are either rightjustified towards the LSB with sign extension padding towards the MSB (Q8.23), or leftjustified towards the MSB with zero fill towards the LSB (Q0.31). Q0.31 theoretically permits up to 32 significant bits, but hardware interfaces that accept Q0.31 rarely use all the bits.
 Singleprecision floating point has 23 explicit bits plus one hidden bit and sign bit, resulting in 25 significant bits total. Denormal numbers have fewer significant bits.
 Singleprecision floating point can express values up to ±1.7e+38, which explains the large headroom.
 The dynamic range shown is for denormals up to the nominal maximum value ±1.0. Note that some architecturespecific floating point implementations such as NEON don't support denormals.
Conversions
This section discusses data conversions between various representations.
Floating point conversions
To convert a value from Qm.n format to floating point:
 Convert the value to floating point as if it were an integer (by ignoring the point).
 Multiply by 2^{n}.
For example, to convert a Q4.27 internal value to floating point, use:
float = integer * (2 ^ 27)
Conversions from floating point to fixed point follow these rules:
 Singleprecision floating point has a nominal range of ±1.0, but the full range for intermediate values is ±1.7e+38. Conversion between floating point and fixed point for external representation (such as output to audio devices) will consider only the nominal range, with clamping for values that exceed that range. In particular, when +1.0 is converted to a fixedpoint format, it is clamped to +1.0 minus one LSB.
 Denormals (subnormals) and both +/ 0.0 are allowed in representation, but may be silently converted to 0.0 during processing.
 Infinities will either pass through operations or will be silently hardlimited to +/ 1.0. Generally the latter is for conversion to a fixedpoint format.
 NaN behavior is undefined: a NaN may propagate as an identical NaN, or may be converted to a Default NaN, may be silently hard limited to +/ 1.0, or silently converted to 0.0, or result in an error.
Fixed point conversions
Conversions between different Qm.n formats follow these rules:
 When m is increased, sign extend the integer part at left.
 When m is decreased, clamp the integer part.
 When n is increased, zero extend the fractional part at right.
 When n is decreased, either dither, round, or truncate the excess fractional bits at right.
For example, to convert a Q4.27 value to Q0.15 (without dither or rounding), right shift the Q4.27 value by 12 bits, and clamp any results that exceed the 16bit signed range. This aligns the point of the Q representation.
To convert Q7.24 to Q7.23, do a signed divide by 2, or equivalently add the sign bit to the Q7.24 integer quantity, and then signed right shift by 1. Note that a simple signed right shift is not equivalent to a signed divide by 2.
Lossy and lossless conversions
A conversion is lossless if it is
invertible:
a conversion from A
to B
to
C
results in A = C
.
Otherwise the conversion is lossy.
Lossless conversions permit roundtrip format conversion.
Conversions from fixed point representation with 25 or fewer significant bits to floating point are lossless. Conversions from floating point to any common fixed point representation are lossy.