Summary

Arm NN 21.08 was focused on providing new capabilities and improve performance::

Added the ability to import protected DMA Buffers and allow Arm NN to run inferences that are in Protected GPU Memory. As well as providing Custom Memory Allocator which supports importing malloc, Dma_buf and protected Dma buffers.
Users with multi core NPUs has been given the ability to pin inferences to selected cores giving them the ability to balance parallel workloads across the NPU and increase throughput.
Boost has been completely removed from the code base making Arm NN easier to integrate into other software stacks.
Added support for non-constant weights and biases on FullyConnected which lay the groundwork for supporting more models.
More operators supported on Arm NN, TfLite Parser, TfLite Delegate and Android NNAPI driver.

New Features

Moved unit tests from BOOST to doctest.
UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added on CpuRef backend.
Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
Reduce Operator can now support multiple axes.
Optimisation added to fuse PAD Operator into Depthwise Convolution Operator.
Added SIN and LOG support to ElementWiseUnary Operator on CpuRef, CpuAcc (Only LOG is supported) and GpuAcc backends.
Added SHAPE Operator support on CpuRef backend.
Moved useful test utilities to new static library (libarmnnTestUtils.a).
Added ability to create multiple LoadedNetworks from one OptimizedNetwork.
Arm NN TfLite Delegate Image Classification sample application added to samples directory.
Added fully comprehensive Arm NN Operator list page to Doxygen.
Added support to allow Arm NN to run inferences that are in Protected GPU Memory.
- Creation of Protected Memory is handled via a Custom Memory Allocator which supports importing malloc, Dma_buf and protected DMA buffers.

TfLite Parser

EXPAND_DIMS Operator support added.
PRELU Operator support added.
SHAPE Operator support added.
Comparison Operator support added (EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL and NOT_EQUAL).
Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
Added support for shape_signature, which will now be the preferred way to detect dynamic tensors.
- If creating an instance of the ITfLiteParser and the model used is dynamic, then please ensure that m_InferAndValidate is set in the TfLiteParserOptions and m_shapeInferenceMethod is set to InferAndValidate in the OptimizerOptions.

ArmNN Serializer/Deserializer

Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
Added SIN and LOG support to ElementWiseUnary Operator.
UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added.

ExecuteNetwork App Changes

Added option to specify what size Arm NN thread pool to use when running inferences asynchronously.
Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8.
Added option to specify different input data for every iteration of ExecuteNetwork.
Added option to print additional information such as the TensorInfo, Descriptor and Convolution method when profiling is enabled.

NOTE: To run dynamic models through ExecuteNetwork the --infer-output-shape flag should be set.

Bug Fixes

Removed duplicate check for Dequantize input type when checking if operator is supported.
Fixed undefined behaviour in PolymorphicDowncast.
Fixed binding of reference to null pointer in RefFullyConnectedWorkload.
Fixed PermutationVector.end() to cope with dimensions < 5 in PermutationVector class.
Fixed cl_ext.h include path in CL backend.
Fixed bugs in PreCompiledLayer. E.g. A new shared_ptr was being created instead of allowing std::move to convert the unique_ptr into a shared_ptr.
Fixed gcc 9.3.0 compiler warning in TfLiteParser.
Fixed issue so that the BackendRegistry is cleaned up correctly following negative tests.

Other Changes

Print Elementwise and Comparison Operator descriptors in a dot graph.
Added IsConstant flag to TensorInfo. This should be set if using the new AddFullyConnectedLayer Graph API when weights and bias are constant. An example of this can be found in samples/SimpleSample.cpp.
Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8 to ImageTensorGenerator.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 26.0.0 while also bumping our Parsers and Delegate to 24.2.0 following Semantic Versioning guidelines.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Rework the async threadpool	`f364d53`	https://review.mlplatform.org/c/ml/armnn/+/5801	Be aware that these classes are in the experimental namespace and should be treated as such. struct INetworkProperties: Field m_NumThreads has been removed from the middle position of this structural type. Size of this type has been changed from 32 bytes to 24 bytes. class IWorkingMemHandle: Pure virtual method GetInferenceId ( ) has been removed from this class. class IAsyncExecutionCallback: The following methods have been removed: GetEndTime ( ) const GetStartTime ( ) const Wait ( ) const GetStatus ( ) const
Add IsConstant flag to TensorInfo	`b082ed0`	https://review.mlplatform.org/c/ml/armnn/+/5842	class TensorInfo: Size of this class has been increased from 80 bytes to 88 bytes. This is due to the addition of private member bool m_IsConstant. An object of this class can be allocated by applications which the old size will be hardcoded at original compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. struct BindingPointInfo: Size of field m_TensorInfo has been changed from 80 bytes to 88 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
Add protected mode to ArmNN CreationOptions	`15fcc7e`	https://review.mlplatform.org/c/ml/armnn/+/5963	struct IRuntime::CreationOptions: Field m_ProtectedMode has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add the Custom Memory Allocator interface definition	`801e2d5`	https://review.mlplatform.org/c/ml/armnn/+/5967	struct IRuntime::CreationOptions: Field m_CustomAllocator has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add front end support for UnidirectionalSequenceLstm on ArmNN	`8ed39ae`	https://review.mlplatform.org/c/ml/armnn/+/5956	struct LstmDescriptor: Field m_TimeMajor has been added to this type. This field will not be initialized by old clients. Size of the inclusive type has been changed.
JSON profiling output	`554fa09`	https://review.mlplatform.org/c/ml/armnn/+/5968	struct INetworkProperties: Field m_ProfilingEnabled has been added to this type. This field will not be initialized by old clients.
ConstTensorsAsInput: FullyConnected	`81beae3`	https://review.mlplatform.org/c/ml/armnn/+/5942	class ILayerVisitor: Pure virtual method VisitFullyConnectedLayer ( IConnectableLayer const, struct FullyConnectedDescriptor const&, char const ) has been added to this class. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.The following previously deprecated functions have been removed: INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name) INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, char const* name)
Adds CustomAllocator interface and Sample App	`c1c872f`	https://review.mlplatform.org/c/ml/armnn/+/5987	struct IRuntime::CreationOptions: Field m_CustomAllocatorMap has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications. class BackendRegistry: Field m_CustomMemoryAllocatorMap has been added to this type. Size of this type has been changed from 80 bytes to 136 bytes.
Allow profiling details to be switched off during profiling	`f487486`	https://review.mlplatform.org/c/ml/armnn/+/6069	struct INetworkProperties: Field m_OutputNetworkDetails has been added at the middle position of this structural type. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.

The following back-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Refactor the reporting of capabilities from backends	`b9af86e`	https://review.mlplatform.org/c/ml/armnn/+/5728	class IBackendInternal: virtual function GetCapabilities() const has been added, replacing the now deprecated HasCapability() function. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Add protected mode to ArmNN CreationOptions	`15fcc7e`	https://review.mlplatform.org/c/ml/armnn/+/5963	class IBackendInternal: virtual function UseCustomMemoryAllocator() has been added. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

TfLite Delegate

New features

PRELU Operator Support added.
SHAPE Operator support added.
Added Asynchronous Network Execution.
Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].

Build Dependencies

Tools	Supported Version
Git	2.17.1 or later
SCons	2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake	3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow	2.3.1
Onnx	1.6.0
Flatbuffer	1.12.0
Protobuf	3.12.0
Android NDK	r20b
mapbox/variant	1.2.0

Android 12 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver	Android Compatibility Test Suite	Android Vendor Test Suite
android-12	SP1A.210812.003	r32p1_01eac0	12_r1 (eng.upr473.20210901.005349)¹	12_r1 (eng.upr473.20210901.024841)

1: CtsNNAPITestCases with Mali Driver r32p1_01eac0. The following test is known to be failing: AddTwoWithHardwareBufferInputWithGPUUsage. Investigations indicate this failure is due to Android NN HAL utilizing Gralloc functionality not required by the Gralloc API. This issue has been raised with Google Android team, and is tracked as https://partnerissuetracker.corp.google.com/issues/202025253. Please quote Arm reference MIDCET-3783 when discussing this issue.

Android 11 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver	Android Compatibility Test Suite	Android Vendor Test Suite
android-11.0.0_r1	RP1A.200720.009	r31p0_01eac0	11_r4 (7352019)	11_r4(7337463)
android-11.0.0_r6	RPM1.210413.002	r32p0_01eac0	11_r4 (7352019)	11_r4 (7337463)
android-11.0.0_r6	RPM1.210413.002	r33p0_01eac0	11_r4 (7352019)	11_r4 (7337463)

Android 10 Compatibility Testing was performed using the following:

Androidtag	Android Build ID	Mali Driver
android-10.0.0_r39	QQ3A.200605.002.A1	R23P0_01REL0

ARM-software/armnn v21.08 Release 21.08 on GitHub