NNAPI driver implementation best practices

This page describes best practices for implementing Neural Networks API (NNAPI) drivers to allow for broad adoption of the NNAPI by app developers.

Keep startup times short

If your driver transforms the weights of a model on first use, make sure the driver supports compilation caching, which reduces the time used for compilation when an app starts. This is important as apps might avoid using hardware acceleration if start-up times are too long. For example, some apps have more than 100 MB of weights and transforming these each time the app launches is wasteful.

Reduce minimal latency

To ensure that models use hardware acceleration, it's important to reduce the minimal latency in drivers. Many apps use small models that are executed multiple times and if the minimal latency to execute a workload is too high, such as a few milliseconds, models might run the workload on the CPU, which only takes one or two milliseconds, instead of using hardware accelerations. Be careful of costly thread synchronization.

Use the NN HAL SchedTune group

From Android 11 or higher, AOSP includes a dedicated NN HAL SchedTune group that allows interprocess NN HAL processes to use big cores, similar to same-process implementation within the predefined top-app cgroup. Using this SchedTune group reduces driver overhead, especially for small models.

To use the SchedTune group, add the following line to the init.rc file of the NN HAL process:

writepid /dev/stune/nnapi-hal/tasks