Summary

The 21.05 Release of Arm NN was focused on providing new capabilities to allow users attain higher performance by:

Making the Arm NN Core thread safe opening the possibility of running multiple inferences on the same model in parallel software threads.
Allowing graphs on the GPU backend import their input and output buffers either from correctly aligned main memory or from kernel memory exposed as a dma_buf, thus reducing memory usage and saving the time involved in copying data into and out of the GPU memory space.

In addition to this, support was added to allow the MobileBERT network to be parsed and run.

Finally three deprecated components: the Tensorflow Parser, the Caffe Parser and the Arm NN Quantizer tool, were removed.

New Features

CAST Operator support added on CpuRef, CpuAcc, GpuAcc Backends.
Non-const weights support added on FULLY_CONNECTED layer for CpuRef Backend.
Enable Input and Output Memory Import on GPU (Malloc and DmaBuf).
Asynchronous Network Execution for CpuRef Backend.
Optimisation added to fuse PAD into Pooling2d if possible.
ASR sample application added to samples directory.

TfLite Parser

ABS Operator Support added.
ARG_MIN Operator Support added.
CAST Operator Support added.
LOGICAL_NOT Operator Support added.
RSQRT Operator Support added.
Non-const weights support added on FULLY_CONNECTED layer.
Turn off Biases when data location is -1 (Added to support MobileBERT).

ArmNN Serializer/Deserializer

Added Signed64 support to Serializer and Deserializer.
Added QAsymmS8 support to Serializer.
Added L2 Pooling algorithm to Deserializer.

ExecuteNetwork App Changes

Asynchronous Network Execution support (Currently for CpuRef Backend).
Re-enabled GPU profiling in ExecuteNetwork.

Deprecated features

Deprecated the Caffe Parser.
Deprecated the Tensorflow Parser.
Deprecated the Arm NN Quantizer tool.
Deprecated m_Output_Type from the ArgMinMaxDescriptor: the output type is solely determined by the data type of the output tensor.

Bug Fixes

Fix CheckProfilingObjectUids test failing on Ubuntu 21.04.
Fix added to Serializer to handle situations where a shape has some unspecified dimensions.
Fix added to AddBroadcastReshapeLayer optimisation to prevent modification to constant layers with multiple connections.
Fix added to use CMake value ${CMAKE_THREAD_LIBS_INIT} throughout instead of 'pthread'.
Fix added to handle negative axis correctly in ARG_MAX (TfLiteParser) and SPLIT (TfLiteParser & TfLiteDelegate) operators.
Fixed TfLiteDelegate Normalization & Softmax for Android if NDK is less than r21.
Fixed Deserializer issue where layer bindings were incorrectly assigning the tensor info of one output to all 4 outputs.
Fixed x86_64 ArmNN DockerFile.
Fixed TuningLevel enumeration values to be consistent.
Fixed YoloV3 test application's incorrect use of std::abs.
Improved performance on SqueezeNet v1.1.

Other Changes

Removed cross-wiring in DepthwiseConvolution2d. The permutation of the full tensor info is now performed in armnnUtils::Permuted.
Moved doctest third-party library to armnn from delegate.
Updated TfLiteDelegate Python Integration guide with new links. Also added information about the TFLite Model Benchmark Tool.
Updated Cross Compiling Guide.
Improved Graph memory usage.

Known Issues

Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 25.0.0 while also bumping our Parsers and Delegate to 24.1.0 following Semantic Versioning guidelines.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Add Async Queue to IRuntime	`e813d67`	https://review.mlplatform.org/c/ml/armnn/+/5493	For struct INetworkProperties the member variable size_t m_NumThreads has been added resulting in the change of size of the inclusive type.
Add front-end support for CAST + Add TfLiteParser support for CAST	`b392e98`	https://review.mlplatform.org/c/ml/armnn/+/5374	For enum class LayerType a new enum for Cast has been added which changes the class member LastLayer to equate to Cast rather than the previous Unmap. We advise against the usage of armnn::LayerType::LastLayer where stability is required.
Add MemorySourceFlags to TensorHandleFactoryRegistry::GetFactory	`73d3e2e`	https://review.mlplatform.org/c/ml/armnn/+/5481	For struct INetworkProperties the member variable MemorySource m_InputSource has been added resulting in the change of size of the inclusive type. For struct INetworkProperties the member variable MemorySource m_OutputSource has been added resulting in the change of size of the inclusive type.
Move ILayerSupport.hpp to backends folder	`cae4568`	https://review.mlplatform.org/c/ml/armnn/+/5500	include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface. Front end users should move to using ABI stable GetILayerSupportByBackendId()
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator	`f0a6dec`	https://review.mlplatform.org/c/ml/armnn/+/5180	For class LayerSupportHandle the member variable BackendId m_BackendId has been added resulting in the change of size of the inclusive type. For struct FullyConnectedDescriptor the member variable bool m_ConstantWeights has been added resulting in the change of size of the inclusive type.
Refactor Async Network API	`55a8ffd`	https://review.mlplatform.org/c/ml/armnn/+/5365	For struct INetworkProperties the member variable bool m_AsyncEnabled has been added resulting in the change of size of the inclusive type.
Remove cross-wiring in depthwise	`7612bd6`	https://review.mlplatform.org/c/ml/armnn/+/5411	For method armnnUtils::Permuted() the argument bool perChannelPermute which was defaulted to false has been removed.
Remove Quantizer	`4a621c4`	https://review.mlplatform.org/c/ml/armnn/+/5486	The formerly deprecated class INetworkQuantizer has been removed and so any code making use of it must be altered.

The following back-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator	`16fb1a2`	https://review.mlplatform.org/c/ml/armnn/+/5180	For class IBackendInternal the virtual method HasCapability ( enum BackendCapability ) const has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Move ILayerSupport.hpp to backends folder	`cae4568`	https://review.mlplatform.org/c/ml/armnn/+/5500	include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface.
Generalise ConstCpuTensorHandle	`1f58f03`	https://review.mlplatform.org/c/ml/armnn/+/5515	include/armnn/backends/CpuTensorHandleFwd.hpp has been deprecated and replaced with include/armnn/backends/TensorHandleFwd.hpp and the forward declarations it contained have also been renamed to remove "Cpu".
Enable import on GPU	`e5f0b24`	https://review.mlplatform.org/c/ml/armnn/+/5605	For class IBackendInternal the virtual method CreateWorkloadFactory with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. For class IBackendInternal the virtual method RegisterTensorHandleFactories with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. For class ITensorHandleFactory the method SupportsMapUnmap() is no longer final.

TfLite Delegate

New features

Non-const weights support added on FULLY_CONNECTED layer
CAST operator support
PACK operator support
UNPACK operator support
Added program options to armnn_external_delegate.cpp
- enable-fast-math
- number-of-threads
- save-cached-networks
- cached-network-filepath
Signed64 support added

Bug Fixes

Fix added to set the correct index for connecting constant layers.
Fix added to handle negative axis correctly in SPLIT operator.

Build Dependencies

Tools	Supported Version
Git	2.17.1 or later
SCons	2.4.1 (Ubuntu) 2.5.1 (Debian)
CMake	3.7.2 or later
boost	1.64
Tensorflow	2.3.1
Onnx	1.6.0
Flatbuffer	1.12.0
Protobuf	3.12.0
Android NDK	r20b
mapbox/variant	1.2.0

Android 11 Compatibility Testing was performed using the following

Android Tag	Android Build ID	Mali Driver	Android Compatibility Test Suite	Android Vendor Test Suite
android-11.0.0_r1	RP1A.200720.009	R30P0_01EAC0	11_r3 (7127450)	11_r3 (7137996)
android-11.0.0_r1	RP1A.200720.009	R31P0_01EAC0	11_r3 (7127450)	11_r3 (7137996)
android-11.0.0_r6	RPM1.210413.002	R32P0_01EAC0	11_r4 (7352019)	11_r4 (7337463)

Android 10 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver
android-10.0.0_r39	QQ3A.200605.002.A1	R23P0_01REL0

ARM-software/armnn v21.05 Release 21.05 on GitHub

Summary

New Features

TfLite Parser

ArmNN Serializer/Deserializer

ExecuteNetwork App Changes

Deprecated features

Bug Fixes

Other Changes

Known Issues

ABI/API Changes

TfLite Delegate

New features

Bug Fixes

Build Dependencies

Android 11 Compatibility Testing was performed using the following

Android 10 Compatibility Testing was performed using the following:

ARM-software/armnn v21.05
Release 21.05

on GitHub