Neural Networks HAL 1.2 introduces the concept of burst executions. Burst executions are a sequence of executions of the same prepared model that occur in rapid succession, such as those operating on frames of a camera capture or successive audio samples. A burst object is used to control a set of burst executions, and to preserve resources between executions, enabling executions to have lower overhead. Burst objects enable three optimizations:
- A burst object is created before a sequence of executions, and freed when the sequence has ended. Because of this, the lifetime of the burst object hints to a driver how long it should remain in a high-performance state.
- A burst object can preserve resources between executions. For example, a driver can map a memory object on the first execution and cache the mapping in the burst object for reuse in subsequent executions. Any cached resource can be released when the burst object is destroyed or when the NNAPI runtime notifies the burst object that the resource is no longer required.
- A burst object uses fast message queues (FMQs) to communicate between app and driver processes. This can reduce latency because the FMQ bypasses HIDL and passes data directly to another process through an atomic circular FIFO in shared memory. The consumer process knows to dequeue an item and begin processing either by polling the number of elements in the FIFO or by waiting on the FMQ's event flag, which is signaled by the producer. This event flag is a fast userspace mutex (futex).
An FMQ is a low-level data structure that offers no lifetime guarantees across processes and has no built-in mechanism for determining if the process on the other end of the FMQ is running as expected. Consequently, if the producer for the FMQ dies, the consumer may be stuck waiting for data that never arrives. One solution to this problem is for the driver to associate FMQs with the higher-level burst object to detect when the burst execution has ended.
Because burst executions operate on the same arguments and return the same
results as other execution paths, the underlying FMQs must pass the same data to
and from the NNAPI service drivers. However, FMQs can only transfer
plain-old-data types. Transferring complex data is accomplished by serializing
and deserializing nested buffers (vector types) directly in the FMQs, and using
HIDL callback objects to transfer memory pool handles on demand. The producer
side of the FMQ must send the request or result messages to the consumer
atomically by using MessageQueue::writeBlocking if the queue is blocking, or
by using MessageQueue::write if the queue is nonblocking.
Burst interfaces
The burst interfaces for the Neural Networks HAL are found in
hardware/interfaces/neuralnetworks/1.2/
and are described below. For more information on burst interfaces in the NDK
layer, see
frameworks/ml/nn/runtime/include/NeuralNetworks.h.
types.hal
types.hal
defines the type of data that is sent across the FMQ.
- FmqRequestDatum: A single element of a serialized representation of an execution- Requestobject and a- MeasureTimingvalue, which is sent across the fast message queue.
- FmqResultDatum: A single element of a serialized representation of the values returned from an execution (- ErrorStatus,- OutputShapes, and- Timing), which is returned through the fast message queue.
IBurstContext.hal
IBurstContext.hal
defines the HIDL interface object that lives in the Neural Networks service.
- IBurstContext: Context object to manage the resources of a burst.
IBurstCallback.hal
IBurstCallback.hal
defines the HIDL interface object for a callback created by the Neural Networks
runtime and is used by the Neural Networks service to retrieve hidl_memory
objects corresponding to slot identifiers.
- IBurstCallback: Callback object used by a service to retrieve memory objects.
IPreparedModel.hal
IPreparedModel.hal
is extended in HAL 1.2 with a method to create an IBurstContext object from a
prepared model.
- configureExecutionBurst: Configures a burst object used to execute multiple inferences on a prepared model in rapid succession.
Support burst executions in a driver
The simplest way to support burst objects in a HIDL NNAPI service is to use the
burst utility function ::android::nn::ExecutionBurstServer::create, which is
found in
ExecutionBurstServer.h
and packaged in the libneuralnetworks_common and libneuralnetworks_util
static libraries. This factory function has two overloads:
- One overload accepts a pointer to an IPreparedModelobject. This utility function uses theexecuteSynchronouslymethod in anIPreparedModelobject to execute the model.
- One overload accepts a customizable IBurstExecutorWithCacheobject, which can be used to cache resources (such ashidl_memorymappings) that persist across multiple executions.
Each overload returns an IBurstContext object (which represents the burst
object) that contains and manages its own dedicated listener thread. This thread
receives requests from the requestChannel FMQ, performs the inference, then
returns the results through the resultChannel FMQ. This thread and all other
resources contained in the IBurstContext object are automatically released
when the burst's client loses its reference to IBurstContext.
Alternatively, you can create your own implementation of IBurstContext that
understands how to send and receive messages over the requestChannel and
resultChannel FMQs passed to IPreparedModel::configureExecutionBurst.
The burst utility functions are found in
ExecutionBurstServer.h.
/**
 * Create automated context to manage FMQ-based executions.
 *
 * This function is intended to be used by a service to automatically:
 * 1) Receive data from a provided FMQ
 * 2) Execute a model with the given information
 * 3) Send the result to the created FMQ
 *
 * @param callback Callback used to retrieve memories corresponding to
 *     unrecognized slots.
 * @param requestChannel Input FMQ channel through which the client passes the
 *     request to the service.
 * @param resultChannel Output FMQ channel from which the client can retrieve
 *     the result of the execution.
 * @param executorWithCache Object which maintains a local cache of the
 *     memory pools and executes using the cached memory pools.
 * @result IBurstContext Handle to the burst context.
 */
static sp<ExecutionBurstServer> create(
        const sp<IBurstCallback>& callback, const FmqRequestDescriptor& requestChannel,
        const FmqResultDescriptor& resultChannel,
        std::shared_ptr<IBurstExecutorWithCache> executorWithCache);
/**
 * Create automated context to manage FMQ-based executions.
 *
 * This function is intended to be used by a service to automatically:
 * 1) Receive data from a provided FMQ
 * 2) Execute a model with the given information
 * 3) Send the result to the created FMQ
 *
 * @param callback Callback used to retrieve memories corresponding to
 *     unrecognized slots.
 * @param requestChannel Input FMQ channel through which the client passes the
 *     request to the service.
 * @param resultChannel Output FMQ channel from which the client can retrieve
 *     the result of the execution.
 * @param preparedModel PreparedModel that the burst object was created from.
 *     IPreparedModel::executeSynchronously will be used to perform the
 *     execution.
 * @result IBurstContext Handle to the burst context.
 */
  static sp<ExecutionBurstServer> create(const sp<IBurstCallback>& callback,
                                         const FmqRequestDescriptor& requestChannel,
                                         const FmqResultDescriptor& resultChannel,
                                         IPreparedModel* preparedModel);
The following is a reference implementation of a burst interface found in the
Neural Networks sample driver at
frameworks/ml/nn/driver/sample/SampleDriver.cpp.
Return<void> SamplePreparedModel::configureExecutionBurst(
        const sp<V1_2::IBurstCallback>& callback,
        const MQDescriptorSync<V1_2::FmqRequestDatum>& requestChannel,
        const MQDescriptorSync<V1_2::FmqResultDatum>& resultChannel,
        configureExecutionBurst_cb cb) {
    NNTRACE_FULL(NNTRACE_LAYER_DRIVER, NNTRACE_PHASE_EXECUTION,
                 "SampleDriver::configureExecutionBurst");
    // Alternatively, the burst could be configured via:
    // const sp<V1_2::IBurstContext> burst =
    //         ExecutionBurstServer::create(callback, requestChannel,
    //                                      resultChannel, this);
    //
    // However, this alternative representation does not include a memory map
    // caching optimization, and adds overhead.
    const std::shared_ptr<BurstExecutorWithCache> executorWithCache =
            std::make_shared<BurstExecutorWithCache>(mModel, mDriver, mPoolInfos);
    const sp<V1_2::IBurstContext> burst = ExecutionBurstServer::create(
            callback, requestChannel, resultChannel, executorWithCache);
    if (burst == nullptr) {
        cb(ErrorStatus::GENERAL_FAILURE, {});
    } else {
        cb(ErrorStatus::NONE, burst);
    }
    return Void();
}
