Audio chimes

This content describes chime playback in the high availability renderer (HAR). An Audio crate exposes AudioManager to the HAR app, which controls chime playback.

To keep latency low, playback threads run throughout the lifetime of the app, idling and yielding when no audio plays.

Terminology

asset
AudioAsset pertains to playable audio. Assets are commonly known and exist in the app runtime.
device
AudioDevice refers to a separate bus for playback of audio. The device is the most granular unit relating to hardware accessed by the system. In the standard SDVM implementation, AudioDevice refers to a single Advanced Linux Sound Architecture (ALSA) PCM.
stream
An instance of playback of an asset on a device. Streams persist from the moment of being scheduled until completed, canceled, or ending in error.

Components

Figure 1 displays the component diagram for chime:

Component diagram

Figure 1. Component diagram.

Audio device and PCM

Audio hardware configuration follows the standard HAR platform abstraction layer design, and har-platform-api contains it.

The HAR Audio crate defines a new structure for AudioDevice, which defines fields for all the data structures that affect the internal HAR Audio crate and playback. AudioDevice also uses generics to wrap potential platform-specific additional parameters. In the case of tinyalsa, PlatformAudioDevice contains the descriptors and properties of an ALSA PCM.

/// NOTE: The following code is a sample definition to help understanding, it is not a
/// representation of the final code/implementation.

AudioDevice<PlatformAudioDevice> {
  /// Internal HAR Identifier for the device.
  AudioDeviceID,

  /// The size (in bytes) for chunks of audio data to stream to the device.
  ChunkSize,

  /// Properties necessary to control volume (details in "Mixer control" section).
  VolumeControl,

  /// Properties necessary to control spatialization (details in "Mixer control"
  /// section).
  SpatialControl,

  /// Platform specific data for the AudioDevice.
  /// E.g. ALSA properties and reference to opened PCM.
  PlatformAudioDevice
}

/// Elaboration of the previously mentioned VolumeControl
VolumeControl {
  /// Identifier for the control used to change volume.
  ControlID,

  /// Mapping between Decibel and control values. (see Mixer control section)
  VolumeOutputIndex
}

Audio assets

This section describes how audio assets are configured and implemented.

Configuration

The initial HAR audio implementation supports statically configured audio assets. A JSON config defines which assets are available and which assets are defined as WAV files.

The implementation also supports synthesized and streamed audio assets though a more generic asset implementation, which accepts a function to generate audio data.

Implementation

Implement assets using two separate constructs, AudioAsset and AudioStream.

AudioAsset defines the static properties of an asset, and a container for potential internal data related to the asset. From AudioAsset AudioStream can be derived, which is a single streamable instance of the asset. AudioStream contains an internal state related to the singular stream playback.

/// NOTE: The following code is a sample definition to help understanding, it is not a
/// representation of the final code/implementation.

/// Static properties and definition of an Asset.
AudioAsset {
  /// Perform optional initialization steps, e.g. load bytes from file into memory.
  /// Can also define lazy loading, to load data at first playback instead.
  fn initialize(LazyLoad);

  /// Create a new AudioStream from the asset.
  fn create_stream() -> AudioStream;

  /// More functions for metadata etc. of the asset.
  ...
}

/// Single streamable instance of an AudioAsset
AudioStream {
  /// Gets the next bytes to play from the Asset together with if the current chunk of
  /// bytes contains any control signals (e.g. fade-out).
  fn get_playback(num_bytes: usize) -> ([u8], ControlSignals);

  /// Gets playback Mode details used to handle special states of playback
  /// e.g. when a chime gets is interrupted and put in "fade-out" mode.
  fn playback_mode() -> PlaybackMode;

  /// [0.0, 1.0] indication of how much of the stream was played.
  fn progress() -> f32;

  /// Reset the stream, e.g. if it should play again.
  fn reset();

  /// Time of which the stream was created.
  fn created_at() -> Instant;

  /// Additional metadata etc. for the stream.
  ...
}

Chime playback

This section describes the API and procedure for playback of a chime. A singular chime playback is referred to as a stream.

Lifecycle of a stream

Figure 2 illustrates the lifecycle of a stream:

Stream playback and events

Figure 2. Stream playback and events.

Figure 2 describes these steps:

  1. Play: Schedule stream to play.

  2. Prioritize: Playback prioritization decides whether to:

    • Play chime now (started event when the first bytes)
    • Play chime later (paused or resumed event)
    • Deprioritize chime (canceled event)
  3. Mixer controls: If needed, update mixer controls based on configured behaviors.

  4. Write bytes: Write a chunk of bytes to AudioDevice.

  5. More data: If the stream has more data, return to Step 2.

  6. Repeat: If the stream should be repeated, reset and return to Step 2 (restarted event).

  7. Completed: The stream completed successfully (FinishedSuccessfully event).

The chime can be interrupted with pause, resume, or stop calls at any time.

Chime priorities

This logic sets chime priorities:

  1. Playback mode overrides. For example, a chime in the fade out mode is always granted top priority until the fade out is completed.

  2. Specified priority.

  3. If equal priority is more recent, the chime plays first.

When chimes are of equal priority, AudioManager is instantiated with an enum value.

API

Events

If an event channel is provided when the chime starts, HAR Audio emits a number of events during the playback. The supported events are shown in this example:

/// NOTE: The following code is a sample definition to help understanding, it is not a
/// representation of the final code/implementation.

StreamBehaviors<PlatformStreamBehaviors> {
  /// What should happen if the stream is interrupted for a higher priority stream.
  /// e.g. pause-and-resume or cancel, will also define preference for fade-out.

  OverrunBehavior,
  /// Urgency, if interrupted streams are allowed to "fade-out", or if the stream should
  /// urgently disrupt any other playback.
  Optional<Urgency>,

  /// Priority for the stream (or minimum if not specified).
  Optional<StreamPriority>
  /// Descriptor if a stream should be played on repeat.
  Optional<RepeatBehavior>
  /// Volume, if the stream should play at a specific volume.
  Optional<Volume>
  /// Spatialization, if the stream should play with specific spatialization.
  Optional<Spatialization>

  /// Optional generic for future expandability of the API, or pass-through of platform
  /// specific Stream Behaviors
  Optional<PlatformStreamBehaviors>
}

/// Plays a chime on specified device with given behaviors. StreamEvents are delivered
/// using the provided event transmitter. This method won't wait for any events.
fn play(AudioDeviceID, AssetID, StreamBehaviors, Option<EventTransmitter>) -> StreamController

/// Object used to control a Stream.
StreamController {
  /// Gets the current state/metadata of a stream (e.g. ID, progress, playback_state).
  fn metadata() -> StreamMetadata

  /// Stops the stream.
  fn stop()

  /// Pauses a given stream, if the specified duration expires the stream is cancelled.
  /// Timeout is required to make sure there are no paused streams left indefinitely
  /// pending resumption.
  fn pause(TimeoutDuration)

  /// Resumes a paused stream.
  fn resume()

  /// Updates the spatialization of a playing stream.
  fn set_spatialization(Spatialization)

  /// Updates the volume of a playing stream.
  fn set_volume(Volume)
}

Mixer control

This section describes how volume and spatialization are controlled.

Volume

HAR defines volume consistently in millibels. The har-platform-api crate handles conversion from millibels to control signal.

The relation between millibels and hardware power output is logarithmic, and varies greatly between different hardware and speaker setups. As a result, provide configuration between the values as part of AudioDevice (Audio Device and PCM) configuration, and conversion must take place before calling the platform layer.

As a result, implementation in the PAL API defines two functions.

fn set_volume_millibel(AudioDeviceID, Millibel) {
  /// Default implementation with conversion using DeviceConfig.
}

fn set_volume_control(AudioDeviceID, ControlValue);

The default implementation for set_volume_millibel uses the config provided for AudioDevice, including a set of key-value pairs for reference millibel - control, transform the millibel to control values, and then call the set_volume_control function with converted value.

This design provides a default and enables subsequent implementations to override the default mapping.

HAR audio flow

Figure 3. HAR audio flow.

Spatialization

The Audio API exposes functionality to control what spatial area audio data should play in. These parameters are passed through to the PAL layer, and be applied downstream using hardware controls. Options are defined as part of the PAL API as:

/// NOTE: The following code is a sample definition to help understanding, it is not a
/// representation of the final code/implementation.

enum Spatialization {
  Front,
  FrontLeft,
  FrontRight,
  Center, // No spatialization
  Rear,
  RearLeft,
  RearRight,
  Right,
  Left
}

Mixer control tiers

You can define volume and spatialization on an asset and for a stream. If you define a stream priority, the stream overrides the controls defined by the asset.

Thread management

The audio manager maintains one thread per AudioDevice instance. Each thread operates independently. Interaction between AudioManager and the playback thread uses a shared stream queue sorted by priority.

ALSA calls use ASYNC writes with polling to determine when data is digested.

Thread management sequence

Figure 4. Thread management sequence.

Control signals during polling

When awaiting the sound card to digest bytes, control signals can be issued. For example, to change fade or spatialization of the audio. Polling to get the state of the audio device is either configured at the AudioManager level or defaults to 1 millisecond. After each polling cycle, the playback thread digests and issues any timed control commands.

Buffer management

To minimize interruption latency, buffer sizes written to the device are kept small. When using TinyALSA as a default, buffer size is configured to be the same as the startup_threshold parameter. TinyALSA defines the default as the entire allocated device buffer divided by two.

Stream interruption

When streams are interrupted, the streams maintain thread priority until data they've written to the card is drained. As a result, a transition period takes place between interruption and the new stream.

For example, if an audio sample in HAR uses a:

  • Size of 3,072
  • Rate of 48,000
  • Sample size of two

The pending buffer is calculated as 3,072 and 6,144 frames, which results in an interruption delay of 64 to 128 milliseconds. A production implementation would require a smaller buffer.

Error management and risks

This section describes how errors are managed and potential risks.

Stale streams and queue starvation

Given that AudioStream can be paused, and because playback can occur only from the top-priority AudioStream instance, the risk arises of a growing queue starving low-priority streams.

To avoid this occurrence, each queue is capped at a configurable size. When this value is exceeded, the lowest-priority stream is discarded.

Monitor and alert

In production, the safety monitor tracks audio features to track that playback takes place as expected.

AudioManager monitors the internal statistics specific to latencies and a flag that defines logging performance. After setting these thresholds, warning logs are generated for all debug builds when:

  • Duration between scheduling and starting playback exceeds x milliseconds.
  • (For a non-disrupted stream) asset length and playback time differ by more than y percent.

Device blocked

There's always a small risk of an audio device becoming unresponsive, for example, if it's allocated and written to by another process in the system. Given that playback runs asynchronously in separate threads, and that chimes can be queued up to play later, this is completely transparent to the calling app.

To detect this, a thread health check is made whenever a new chime is scheduled to be played, returning error if a playback thread has a populated queue, and hasn't digested any new bytes for the last second.

For future purposes it might be necessary to attempt restarting / opening devices, but for the initial implementation, errors shouldn't be invisible.

Code structure

On a high level, the code related to chimes playback exists across the following crates:

CRATE: display-safety/crates/(harry-app|harry)

The existing HAR app, which issues calls to play chimes.

NEW CRATE: display-safety/crates/audio

NEW: Crate to manage audio control and playback (this is where most of the functionality exists).

CRATE: display-safety/crates/har-platform-api/audio

PAL including all system calls required for audio.

CRATE: display-safety/crates/har-platform-(android|linux)/audio

Calls to tinyalsa-rs for playback using TinyALSA. QNX support isn't implemented in the initial solution, and this will grow as more platforms are supported.

TINYALSA PAL: display-safety/crates/tinyalsa-audio

TinyALSA-specific code for playback. This is used by the Android and Linux platform implementations.

CRATE: display-safety/crates/tinyalsa-rs

Rust bindings for TinyALSA C implementation

Rust implementation details

Some specific implementation details:

  • All API functions return Result<X, AudioError> where X is either () or a return value.
  • No API functions are marked as unsafe.
  • Mutex and synchronization mechanisms are internal and aren't exposed in the AudioManager API.

Ownership model and AudioManager

  • All app interaction with the audio system takes place through AudioManager or objects returned from AudioManager.

  • AudioManager is thread safe.

  • AudioManager is instantiated once in the HARry app, and Moved, for Looper to have ownership.

  • AudioManager uses a tokio_util::CancellationToken token to manage its started playback threads, ensuring the threads are terminated and resources released if AudioManager is Dropped.

  • AudioManager doesn't explicitly prevent multiple instances from being created. If more than one instance exists, it logs with the warn level.

Shared ownership

A number of objects have shared ownership wrapped and synchronized with exclusive access. These mechanisms aren't exposed in the AudioManager API, but are internal to the audio and PAL implementations.

  • AudioDevice - Each hardware reference (for example, TinyALSA PCM) that is opened (has a handle) has exclusive access. See SMP Design.

  • AudioStream instances have exclusive access after they're scheduled for playback because they can be controlled by the app and simultaneously accessed by the playback thread.

    The playback thread doesn't hold locks during playback, but makes an immutable snapshot of the next buffer to play, and doesn't consider changes until the next buffer is digested.

  • Each playback thread has a playback queue, a shared reference between AudioManager and the playback thread. As a result, the thread needs exclusive access for mutations.

  • Threads with no streams become idle with the Condvar variable to receive wakeup events when new data is detected. This mechanism has shared ownership.

Dependencies

Crates and audio crate are designed to reduce dependencies on crates that aren't approved to be built in the Android source tree. See this list of included crates.

Downstream platform implementations for Android and Linux depend on TinyALSA and the existing display safety tinyalsa-rs crate.

Quality attributes

Reliability

While audio playback is safety critical, this design doesn't cover the implementation of a safety monitoring. Implement this in a separate effort, to verify audio playback reliability on hardware and in production.

Scalability

The one thread per device approach is intended to scale to different hardware setups. Given that each thread is primarily idling, waiting for data, or waiting for the device to digest written data, it shouldn't be demanding on the processor or performance intensive on the system.

The design decision to only play data to a single device, combined with mixer control commands for all further output control ensures the exact output is handled by sound hardware, and should scale for future systems.

Latency

Latency is critical for the audio system, so after implementation, a set of service-level objectives (SLOs) are defined for the latency of the system. To continuously monitor the latency health, monitoring in the system logs not meeting defined SLOs in all debug builds.

For the production versions, monitoring data is passed to some system external to the audio implementation, rather than relying on logs.

Test and test strategy

The crates and the audio crate are designed with test coverage. We added a mock platform implementation to confirm that all capabilities are tested.

The complexity of hardware and bindings preclude extensive test coverage for platform implementations. We provide sample implementations to manually test the solution on hardware and on the Cuttlefish emulator.

Documentation

The README.md file in Audio crates/audio describes how to use AudioManager. crates/audio/examples contains examples for:

  • Implement a platform.
  • Create an instance of AudioManager.
  • Play WavAsset.
  • Play a custom function asset on repeat.
  • Log playback events.