自 2025 年 3 月 27 日起,我们建议您使用 android-latest-release
而非 aosp-main
构建 AOSP 并为其做出贡献。如需了解详情,请参阅 AOSP 的变更。
服务质量
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
从 Android 11 开始,NNAPI 允许应用指示模型的相对优先级、准备给定模型的预计最长时间以及完成给定执行的预计最长时间,从而改进服务质量 (QoS)。此外,Android 11 还引入了更多 NNAPI 错误值,使服务能够更准确地指示发生故障时出现的问题,以便客户端应用更好地应对和恢复。
优先级
对于 Android 11 或更高版本,NN HAL 1.3 中准备的模型具有优先级。此优先级是相对于同一应用所拥有的其他准备好的模型而言的。与优先级较低的执行相比,优先级较高的执行可以使用更多计算资源,并可抢占或耗尽优先级较低的执行的资源。
包含 Priority
显式参数的 NN HAL 1.3 调用是 IDevice::prepareModel_1_3
。请注意,IDevice::prepareModelFromCache_1_3
在缓存参数中隐式包含 Priority
。
支持优先级的策略可能有许多,具体取决于驱动程序和加速器的功能。以下列举了几项策略:
Android 允许服务通过使用 AID (Android UID) 区分发起调用的不同应用。HIDL 具有内置检索机制,可通过 ::android::hardware::IPCThreadState::getCallingUid
方法检索发起调用的应用的 UID。AID 列表可在 libcutils/include/cutils/android_filesystem_config.h
中找到。
时限
从 Android 11 开始,启动模型准备和执行时可以使用 OptionalTimePoint
时限参数。对于可估算任务所需时间的驱动程序,如果驱动程序根据此时限估计无法在时限前完成任务,就可以在任务启动前将其取消。同样,对于正在执行的任务,如果驱动程序根据时限估计无法在时限前完成任务,也可以将其取消。如果任务未在时限内完成或时限已过,时限参数并不会强制驱动程序取消任务。时限参数可用于释放驱动程序内的计算资源并将控制权交还给应用,其速度要快于没有时限的情况。
包含 OptionalTimePoint
时限参数的 NN HAL 1.3 调用如下:
IDevice::prepareModel_1_3
IDevice::prepareModelFromCache_1_3
IPreparedModel::execute_1_3
IPreparedModel::executeSynchronously_1_3
IPreparedModel::executeFenced
如需查看上述每种方法的时限功能的参考实现,请参阅位于 frameworks/ml/nn/driver/sample/SampleDriver.cpp
下的 NNAPI 示例驱动程序。
错误代码
Android 11 在 NN HAL 1.3 中包含四个错误代码值,旨在改进错误报告功能,使驱动程序能够更好地传达其状态,使应用能够更妥当地恢复。以下是 ErrorStatus
中的错误代码值。
MISSED_DEADLINE_TRANSIENT
MISSED_DEADLINE_PERSISTENT
RESOURCE_EXHAUSTED_TRANSIENT
RESOURCE_EXHAUSTED_PERSISTENT
在 Android 10 或更低版本中,驱动程序只能通过 GENERAL_FAILURE
错误代码表示故障。从 Android 11 开始,您可以使用两个 MISSED_DEADLINE
错误代码来指出工作负载因以下原因而被取消:时限已到,或因驱动程序预测工作负载无法在时限内完成。您还可以使用两个 RESOURCE_EXHAUSTED
错误代码来指示任务因驱动程序内的资源限制而失败,例如驱动程序没有足够的内存来完成调用。
这两种错误的 TRANSIENT
版本均表示问题是暂时的,后续对同一任务的调用可能会在短暂延迟后成功。例如,如果驱动程序正忙于执行前面的长时间运行的工作或占用大量资源的工作,但是等到驱动程序不忙于执行前面的工作时,新任务就会成功完成,那么在这种情况下就应返回此错误代码。这两种错误的 PERSISTENT
版本均表示,后续对同一任务的调用预计会一直失败。例如,如果驱动程序估计即使在最理想的情况下任务也无法在时限内完成,或者模型本身过大以致于所需的资源超出了驱动程序可提供的资源水平,那么在这种情况下就应返回此错误代码。
验证
服务质量功能在 NNAPI VTS 测试 (VtsHalNeuralnetworksV1_3Target
) 中进行测试。其中包括一组进行验证的测试 (TestGenerated/ValidationTest#Test/
),用于确保驱动程序拒绝无效优先级;还有一组名为 DeadlineTest
的测试 (TestGenerated/DeadlineTest#Test/
),用于确保驱动程序能够正确处理时限。
本页面上的内容和代码示例受内容许可部分所述许可的限制。Java 和 OpenJDK 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-03-26。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-03-26。"],[],[],null,["# Quality of service\n\n| **Deprecated:** Starting in Android 15, the\n| [NNAPI (NDK API)](https://developer.android.com/ndk/guides/neuralnetworks) is deprecated. The Neural Networks HAL interface\n| continues to be supported.\n|\n| For more information, see the\n| [NNAPI Migration Guide](https://developer.android.com/ndk/guides/neuralnetworks/migration-guide).\n\nStarting from Android 11, the NNAPI offers better\nquality of service (QoS) by allowing an app to indicate the relative priorities\nof its models, the maximum amount of time expected for a given model to be\nprepared, and the maximum amount of time expected for a given execution to be\ncompleted. Further, Android 11 introduces\nadditional NNAPI error values enabling a service to more accurately indicate\nwhat went wrong when a failure occurs so that the client app can better react\nand recover.\n\nPriority\n--------\n\nFor Android 11 or higher, models are prepared with a\npriority in the NN HAL 1.3. This priority is relative to other prepared models\nowned by the same app. Higher-priority executions can use more compute resources\nthan lower-priority executions, and can preempt or starve lower-priority\nexecutions.\n\nThe NN HAL 1.3 call that includes `Priority` as an explicit argument is\n[`IDevice::prepareModel_1_3`](https://cs.android.com/android/platform/superproject/+/android-latest-release:hardware/interfaces/neuralnetworks/1.3/IDevice.hal;l=199).\nNote that\n[`IDevice::prepareModelFromCache_1_3`](https://cs.android.com/android/platform/superproject/+/android-latest-release:hardware/interfaces/neuralnetworks/1.3/IDevice.hal;l=294)\nimplicitly includes `Priority` in the cache arguments.\n\nThere are many possible strategies for supporting priorities depending on the\ncapabilities of the driver and accelerator. Here are several strategies:\n\n- For drivers that have built-in priority support, directly propagate the `Priority` field to the accelerator.\n- Use a per-app priority queue to support different priorities even before an execution reaches the accelerator.\n- Pause or cancel low-priority models that are being executed to\n free the accelerator to execute high-priority models. Do this by either\n inserting *checkpoints* in low-priority models that, when reached, query\n a flag to determine whether the current execution should be halted\n prematurely or by partitioning the model into *submodels* and querying the\n flag between submodel executions. Note that the use of checkpoints or\n submodels in models prepared with a priority can introduce additional\n overhead that isn't present for models without a priority in versions\n lower than NN HAL 1.3.\n\n - To support preemption, preserve the execution context including the next operation or sub-model to be executed and any relevant intermediate operand data. Use this execution context to resume the execution at a later time.\n - Full preemption support isn't necessary, so the execution context doesn't need to be preserved. Because NNAPI model executions are deterministic, executions can be restarted from scratch at a later time.\n\nAndroid enables services to differentiate between different calling apps through\nthe use of an AID (Android UID). HIDL has built-in mechanisms to retrieve the\ncalling app's UID through the method\n`::android::hardware::IPCThreadState::getCallingUid`. A list of AIDs can be\nfound in\n[`libcutils/include/cutils/android_filesystem_config.h`](https://android.googlesource.com/platform/system/core/+/android16-release/libcutils/include/cutils/android_filesystem_config.h).\n\nDeadlines\n---------\n\nStarting from Android 11, model preparation and\nexecutions can be launched with an `OptionalTimePoint` deadline argument. For\ndrivers that can estimate how long a task takes, this deadline allows the driver\nto abort the task before it starts if the driver estimates that the task can't\nbe completed before the deadline. Similarly, the deadline allows the driver to\nabort an ongoing task that it estimates won't be completed before the deadline.\nThe deadline argument doesn't force a driver to abort a task if the task isn't\ncomplete by the deadline or if the deadline has passed. The deadline argument\ncan be used to free up compute resources within the driver and return control\nto the app faster than is possible without the deadline.\n\nThe NN HAL 1.3 calls that include `OptionalTimePoint` deadlines as an argument\nare:\n\n- `IDevice::prepareModel_1_3`\n- `IDevice::prepareModelFromCache_1_3`\n- `IPreparedModel::execute_1_3`\n- `IPreparedModel::executeSynchronously_1_3`\n- `IPreparedModel::executeFenced`\n\nTo see a reference implementation of the deadline feature for each of the above\nmethods, see the NNAPI sample driver at\n[`frameworks/ml/nn/driver/sample/SampleDriver.cpp`](https://cs.android.com/android/platform/superproject/+/android-latest-release:packages/modules/NeuralNetworks/driver/sample_hidl/SampleDriver.cpp?q=SampleDriver.cpp).\n\nError codes\n-----------\n\nAndroid 11 includes four error code values in\nNN HAL 1.3 to improve error reporting, allowing drivers to better communicate\ntheir state and apps to recover more gracefully. These are the error code\nvalues in `ErrorStatus`.\n\n- `MISSED_DEADLINE_TRANSIENT`\n- `MISSED_DEADLINE_PERSISTENT`\n- `RESOURCE_EXHAUSTED_TRANSIENT`\n- `RESOURCE_EXHAUSTED_PERSISTENT`\n\nIn Android 10 or lower, a driver could only indicate a failure through the\n`GENERAL_FAILURE` error code. From Android 11, the\ntwo `MISSED_DEADLINE` error codes can be used to indicate that the workload was\naborted because the deadline was reached or because the driver predicted the\nworkload wouldn't complete by the deadline. The two `RESOURCE_EXHAUSTED` error\ncodes can be used to indicate that the task failed because of a resource\nlimitation within the driver, such as the driver not having enough memory for\nthe call.\n\nThe `TRANSIENT` version of both errors indicates that the problem is temporary,\nand that future calls to the same task might succeed after a short delay. For\nexample, this error code should be returned when the driver is busy with prior\nlong-running or resource-intensive work, but that the new task would complete\nsuccessfully if the driver wasn't busy with the prior work. The `PERSISTENT`\nversion of both errors indicates that future calls to the same task are always\nexpected to fail. For example, this error code should be returned when the\ndriver estimates the task wouldn't complete by the deadline even under perfect\nconditions, or that the model is inherently too large and exceeds the driver's\nresources.\n\nValidation\n----------\n\nThe quality of service functionality is tested in the NNAPI VTS tests\n(`VtsHalNeuralnetworksV1_3Target`). This includes a set of tests for validation\n(`TestGenerated/ValidationTest#Test/`) to ensure that the driver rejects invalid\npriorities and a set of tests called `DeadlineTest`\n(`TestGenerated/DeadlineTest#Test/`) to ensure that the driver handles deadlines\ncorrectly."]]