Quality of service

Starting from Android 11, the NNAPI offers better quality of service (QoS) by allowing an app to indicate the relative priorities of its models, the maximum amount of time expected for a given model to be prepared, and the maximum amount of time expected for a given execution to be completed. Further, Android 11 introduces additional NNAPI error values enabling a service to more accurately indicate what went wrong when a failure occurs so that the client app can better react and recover.

Priority

For Android 11 or higher, models are prepared with a priority in the NN HAL 1.3. This priority is relative to other prepared models owned by the same app. Higher-priority executions can use more compute resources than lower-priority executions, and can preempt or starve lower-priority executions.

The NN HAL 1.3 call that includes Priority as an explicit argument is IDevice::prepareModel_1_3. Note that IDevice::prepareModelFromCache_1_3 implicitly includes Priority in the cache arguments.

There are many possible strategies for supporting priorities depending on the capabilities of the driver and accelerator. Here are several strategies:

  • For drivers that have built-in priority support, directly propagate the Priority field to the accelerator.
  • Use a per-app priority queue to support different priorities even before an execution reaches the accelerator.
  • Pause or cancel low-priority models that are being executed to free the accelerator to execute high-priority models. Do this by either inserting checkpoints in low-priority models that, when reached, query a flag to determine whether the current execution should be halted prematurely or by partitioning the model into submodels and querying the flag between submodel executions. Note that the use of checkpoints or submodels in models prepared with a priority can introduce additional overhead that isn't present for models without a priority in versions lower than NN HAL 1.3.

    • To support preemption, preserve the execution context including the next operation or sub-model to be executed and any relevant intermediate operand data. Use this execution context to resume the execution at a later time.
    • Full preemption support isn't necessary, so the execution context doesn't need to be preserved. Because NNAPI model executions are deterministic, executions can be restarted from scratch at a later time.

Android enables services to differentiate between different calling apps through the use of an AID (Android UID). HIDL has built-in mechanisms to retrieve the calling app's UID through the method ::android::hardware::IPCThreadState::getCallingUid. A list of AIDs can be found in libcutils/include/cutils/android_filesystem_config.h.

Deadlines

Starting from Android 11, model preparation and executions can be launched with an OptionalTimePoint deadline argument. For drivers that can estimate how long a task takes, this deadline allows the driver to abort the task before it starts if the driver estimates that the task can't be completed before the deadline. Similarly, the deadline allows the driver to abort an ongoing task that it estimates won't be completed before the deadline. The deadline argument doesn't force a driver to abort a task if the task isn't complete by the deadline or if the deadline has passed. The deadline argument can be used to free up compute resources within the driver and return control to the app faster than is possible without the deadline.

The NN HAL 1.3 calls that include OptionalTimePoint deadlines as an argument are:

  • IDevice::prepareModel_1_3
  • IDevice::prepareModelFromCache_1_3
  • IPreparedModel::execute_1_3
  • IPreparedModel::executeSynchronously_1_3
  • IPreparedModel::executeFenced

To see a reference implementation of the deadline feature for each of the above methods, see the NNAPI sample driver at frameworks/ml/nn/driver/sample/SampleDriver.cpp.

Error codes

Android 11 includes four error code values in NN HAL 1.3 to improve error reporting, allowing drivers to better communicate their state and apps to recover more gracefully. These are the error code values in ErrorStatus.

  • MISSED_DEADLINE_TRANSIENT
  • MISSED_DEADLINE_PERSISTENT
  • RESOURCE_EXHAUSTED_TRANSIENT
  • RESOURCE_EXHAUSTED_PERSISTENT

In Android 10 or lower, a driver could only indicate a failure through the GENERAL_FAILURE error code. From Android 11, the two MISSED_DEADLINE error codes can be used to indicate that the workload was aborted because the deadline was reached or because the driver predicted the workload wouldn't complete by the deadline. The two RESOURCE_EXHAUSTED error codes can be used to indicate that the task failed because of a resource limitation within the driver, such as the driver not having enough memory for the call.

The TRANSIENT version of both errors indicates that the problem is temporary, and that future calls to the same task might succeed after a short delay. For example, this error code should be returned when the driver is busy with prior long-running or resource-intensive work, but that the new task would complete successfully if the driver wasn't busy with the prior work. The PERSISTENT version of both errors indicates that future calls to the same task are always expected to fail. For example, this error code should be returned when the driver estimates the task wouldn't complete by the deadline even under perfect conditions, or that the model is inherently too large and exceeds the driver's resources.

Validation

The quality of service functionality is tested in the NNAPI VTS tests (VtsHalNeuralnetworksV1_3Target). This includes a set of tests for validation (TestGenerated/ValidationTest#Test/) to ensure that the driver rejects invalid priorities and a set of tests called DeadlineTest (TestGenerated/DeadlineTest#Test/) to ensure that the driver handles deadlines correctly.