Concurrent Capture

Android 10 improves the user experience that requires more than one active audio capture to happen simultaneously, for example, if the user wants to control a VoIP call or video recorder with voice commands provided by an accessibility service.

The audio framework implements the policy allowing only certain privileged apps to capture concurrently with regular apps.

The concurrency policy is implemented by silencing its captured audio rather than by preventing an application from starting capturing. This allows the framework to dynamically address changes in the number and types of active capture use cases, without preventing an app from starting capturing in a case where it can recover full access to the microphone after another app has finished capturing.

The consequence for the audio HAL and audio subsystem is that they must support several active input streams simultaneously, even if in some cases, only one stream is providing non-silent audio to an active client.

CDD requirements

See CDD for the requirements for concurrent capture support.

Capture situations from audio HAL

A concurrent capture scenario can result in different situations in terms of the number of active input streams, input device selection, or preprocessing configuration.

Concurrency can happen between the following:

  • Several input streams from the application processor (AP)
  • Input streams and a voice call
  • Input streams and an audio DSP implementing a low-power hotword detection

Concurrent activity of AP input streams

The audio policy configuration file audio_policy_configuration.xml is used by the audio framework to determine how many input streams can be opened and active simultaneously.

At a minimum, the audio HAL must support at least one instance of each input profile (mixPort of role sink) listed in the open and active configuration file.

Device selection

When several active clients are attached to the same HAL input stream, the framework selects the appropriate device for this input stream based on use case priority.

When several input streams are active, each stream can have a different device selection.

If the technology is compatible, it's recommended that the audio HAL and subsystem allow different streams to capture from different devices, such as a Bluetooth headset and built-in mic.

If there's an incompatibility (for instance two devices share the same digital audio interface or back end) the audio HAL must choose which stream controls the device selection.

In this case:

  • The resulting state must be consistent and offer the same device selection when the same scenario is repeated.
  • When the concurrency state ends, the remaining active stream must be routed to the initially requested device on this stream.

If a priority order is defined by the audio HAL between active use cases, follow the same order as found in source_priority() in frameworks/av/services/audiopolicy/common/include/policy.h

Pre processing selection

The audio framework can request preprocessing on an input stream using the addEffect() or removeEffect() HAL methods.

For preprocessing on a given input stream, the audio framework enables only the configuration corresponding to the highest-priority active use case on the input stream. However, there might be some overlap during use case activation and deactivation, causing two simultaneous active processes (for example, two instances of echo canceller) to run on the same input stream. In this case, the HAL implementation chooses which request is accepted; it keeps track of the active requests and restores the correct state when either process is disabled.

When several capture streams are active simultaneously, different preprocessing requests might be run on different streams.

The HAL and audio subsystem implementations should allow for different pre processing to be applied to different streams, even if they share the same input device. That is, pre processing should be applied after demuxing the streams from the primary capture source.

If it's not possible for technical reasons on a given audio subsystem, the audio HAL should apply priority rules similar to those listed in Device selection.

Concurrent voice call and capture from AP

Capture from the AP can happen while a voice call is active. This situation isn't new in Android 10 and isn't directly related to the concurrent capture feature, but it's useful to mention the guidelines for this scenario.

Two different types of capture from the AP are needed during a call.

Capturing call RX and TX

Capturing call RX and TX is triggered by the use of audio source AudioSource.VOICE_UPLINK or AudioSource.VOICE_DOWNLINK, and/or device AudioDevice.IN_TELEPHONY_RX.

Audio HALs should expose on input profile (mixPort of role sink) with an available route from device AudioDevice.IN_TELEPHONY_RX.

When a call is connected (audio mode is AudioMode.IN_CALL), it should be possible to have at least one active capture stream from device AudioDevice.IN_TELEPHONY_RX.

Capturing from input devices when a call is active

When a call is active (audio mode is AudioMode.IN_CALL), it should be possible to open and activate input streams from the AP as specified in section Concurrent activity of AP input streams.

However, the priority for device selection and pre processing should always be driven by the voice call in case there is a conflict with the requests from the AP input streams.

Concurrent capture from DSP and AP

When the audio subsystem contains a DSP supporting low-power audio context or hotword detection functions, the implementation should support concurrent capture from the AP and the audio DSP. This includes both capture by the DSP during initial detection phase and capture by the AP with AudioSource.HOTWORD after detection is triggered by the DSP.

This should be reflected by the concurrent capture flag reported by the sound trigger HAL via the implementation descriptor: ISoundTriggerHw.Properties.concurrentCapture = true.

The audio HAL should also expose and input profile specific for hotword capture identified by flag AudioInputFlag.HW_HOTWORD. The implementation should support opening and activating a number of streams on this profile at least equal to the number of sound models that can be loaded concurrently by the sound trigger HAL.

Capture from this input profile should be possible while other input profiles are active.

Implication for Assistant implementations

Requirements on data usage and user notification

Because concurrent mic usage, if abused, can leak user private data, we need the following conditions and guarantees to be applied to the privileged preloaded apps that ask to hold the Assistant role.

  • Data collected through the microphone should not leave the device unless the user is interacting with the Assistant. For example, after the hotword is triggered.
  • Applications listening concurrently should provide visual cues to the user after the hotword is detected. This helps users understand that further conversations would go through a different app, such as Assistant.
  • Users should have the ability to turn off the microphone or the Assistant triggers.
  • When audio recordings are stored, users should have the ability to access, review, and delete recordings at any time.

Functional improvements for Android 10

Assistants not blocking each other

On Android P or lower, when there are two always-on Assistants on the device, only one of them could be listening for its hotword. Hence, there was a need to switch between the two Assistants. In Android 10, the default Assistant can be listening concurrently with the other Assistant. This results in a much smoother experience for users with both Assistants.

Apps holding mic open

When apps like Shazam or Waze hold the mic open, the default Assistant can still be listening for the hotword.

For non-default Assistant apps, there is no change in behavior for Android 10.

Sample audio HAL implementation

An example of an audio HAL implementation complying to the guidelines in this document can be found in AOSP.