github ARM-software/armnn v21.05
Release 21.05

latest releases: v24.08, v24.05, v24.02...
3 years ago

Summary

The 21.05 Release of Arm NN was focused on providing new capabilities to allow users attain higher performance by:

  • Making the Arm NN Core thread safe opening the possibility of running multiple inferences on the same model in parallel software threads.
  • Allowing graphs on the GPU backend import their input and output buffers either from correctly aligned main memory or from kernel memory exposed as a dma_buf, thus reducing memory usage and saving the time involved in copying data into and out of the GPU memory space.

In addition to this, support was added to allow the MobileBERT network to be parsed and run.

Finally three deprecated components: the Tensorflow Parser, the Caffe Parser and the Arm NN Quantizer tool, were removed.

New Features

  • CAST Operator support added on CpuRef, CpuAcc, GpuAcc Backends.
  • Non-const weights support added on FULLY_CONNECTED layer for CpuRef Backend.
  • Enable Input and Output Memory Import on GPU (Malloc and DmaBuf).
  • Asynchronous Network Execution for CpuRef Backend.
  • Optimisation added to fuse PAD into Pooling2d if possible.
  • ASR sample application added to samples directory.

TfLite Parser

  • ABS Operator Support added.
  • ARG_MIN Operator Support added.
  • CAST Operator Support added.
  • LOGICAL_NOT Operator Support added.
  • RSQRT Operator Support added.
  • Non-const weights support added on FULLY_CONNECTED layer.
  • Turn off Biases when data location is -1 (Added to support MobileBERT).

ArmNN Serializer/Deserializer

  • Added Signed64 support to Serializer and Deserializer.
  • Added QAsymmS8 support to Serializer.
  • Added L2 Pooling algorithm to Deserializer.

ExecuteNetwork App Changes

  • Asynchronous Network Execution support (Currently for CpuRef Backend).
  • Re-enabled GPU profiling in ExecuteNetwork.

Deprecated features

  • Deprecated the Caffe Parser.
  • Deprecated the Tensorflow Parser.
  • Deprecated the Arm NN Quantizer tool.
  • Deprecated m_Output_Type from the ArgMinMaxDescriptor: the output type is solely determined by the data type of the output tensor.

Bug Fixes

  • Fix CheckProfilingObjectUids test failing on Ubuntu 21.04.
  • Fix added to Serializer to handle situations where a shape has some unspecified dimensions.
  • Fix added to AddBroadcastReshapeLayer optimisation to prevent modification to constant layers with multiple connections.
  • Fix added to use CMake value ${CMAKE_THREAD_LIBS_INIT} throughout instead of 'pthread'.
  • Fix added to handle negative axis correctly in ARG_MAX (TfLiteParser) and SPLIT (TfLiteParser & TfLiteDelegate) operators.
  • Fixed TfLiteDelegate Normalization & Softmax for Android if NDK is less than r21.
  • Fixed Deserializer issue where layer bindings were incorrectly assigning the tensor info of one output to all 4 outputs.
  • Fixed x86_64 ArmNN DockerFile.
  • Fixed TuningLevel enumeration values to be consistent.
  • Fixed YoloV3 test application's incorrect use of std::abs.
  • Improved performance on SqueezeNet v1.1.

Other Changes

  • Removed cross-wiring in DepthwiseConvolution2d. The permutation of the full tensor info is now performed in armnnUtils::Permuted.
  • Moved doctest third-party library to armnn from delegate.
  • Updated TfLiteDelegate Python Integration guide with new links. Also added information about the TFLite Model Benchmark Tool.
  • Updated Cross Compiling Guide.
  • Improved Graph memory usage.

Known Issues

  • Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
  • There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 25.0.0 while also bumping our Parsers and Delegate to 24.1.0 following Semantic Versioning guidelines.

Feature SHA Gerrit Review Resultant ABI/API changes
Add Async Queue to IRuntime e813d67 https://review.mlplatform.org/c/ml/armnn/+/5493
  • For struct INetworkProperties the member variable size_t m_NumThreads has been added resulting in the change of size of the inclusive type.
Add front-end support for CAST + Add TfLiteParser support for CAST b392e98 https://review.mlplatform.org/c/ml/armnn/+/5374
  • For enum class LayerType a new enum for Cast has been added which changes the class member LastLayer to equate to Cast rather than the previous Unmap. We advise against the usage of armnn::LayerType::LastLayer where stability is required.
Add MemorySourceFlags to TensorHandleFactoryRegistry::GetFactory 73d3e2e https://review.mlplatform.org/c/ml/armnn/+/5481
  • For struct INetworkProperties the member variable MemorySource m_InputSource has been added resulting in the change of size of the inclusive type.
  • For struct INetworkProperties the member variable MemorySource m_OutputSource has been added resulting in the change of size of the inclusive type.
Move ILayerSupport.hpp to backends folder cae4568 https://review.mlplatform.org/c/ml/armnn/+/5500
  • include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface. Front end users should move to using ABI stable GetILayerSupportByBackendId()
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator f0a6dec https://review.mlplatform.org/c/ml/armnn/+/5180
  • For class LayerSupportHandle the member variable BackendId m_BackendId has been added resulting in the change of size of the inclusive type.
  • For struct FullyConnectedDescriptor the member variable bool m_ConstantWeights has been added resulting in the change of size of the inclusive type.
Refactor Async Network API 55a8ffd https://review.mlplatform.org/c/ml/armnn/+/5365
  • For struct INetworkProperties the member variable bool m_AsyncEnabled has been added resulting in the change of size of the inclusive type.
Remove cross-wiring in depthwise 7612bd6 https://review.mlplatform.org/c/ml/armnn/+/5411
  • For method armnnUtils::Permuted() the argument bool perChannelPermute which was defaulted to false has been removed.
Remove Quantizer 4a621c4 https://review.mlplatform.org/c/ml/armnn/+/5486
  • The formerly deprecated class INetworkQuantizer has been removed and so any code making use of it must be altered.

The following back-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator 16fb1a2 https://review.mlplatform.org/c/ml/armnn/+/5180
  • For class IBackendInternal the virtual method HasCapability ( enum BackendCapability ) const has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Move ILayerSupport.hpp to backends folder cae4568 https://review.mlplatform.org/c/ml/armnn/+/5500
  • include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface.
Generalise ConstCpuTensorHandle 1f58f03 https://review.mlplatform.org/c/ml/armnn/+/5515
  • include/armnn/backends/CpuTensorHandleFwd.hpp has been deprecated and replaced with include/armnn/backends/TensorHandleFwd.hpp and the forward declarations it contained have also been renamed to remove "Cpu".
Enable import on GPU e5f0b24 https://review.mlplatform.org/c/ml/armnn/+/5605
  • For class IBackendInternal the virtual method CreateWorkloadFactory with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • For class IBackendInternal the virtual method RegisterTensorHandleFactories with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • For class ITensorHandleFactory the method SupportsMapUnmap() is no longer final.

TfLite Delegate

New features

  • Non-const weights support added on FULLY_CONNECTED layer
  • CAST operator support
  • PACK operator support
  • UNPACK operator support
  • Added program options to armnn_external_delegate.cpp
    • enable-fast-math
    • number-of-threads
    • save-cached-networks
    • cached-network-filepath
  • Signed64 support added

Bug Fixes

  • Fix added to set the correct index for connecting constant layers.
  • Fix added to handle negative axis correctly in SPLIT operator.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
CMake 3.7.2 or later
boost 1.64
Tensorflow 2.3.1
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0
Android 11 Compatibility Testing was performed using the following
Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r1 RP1A.200720.009 R30P0_01EAC0 11_r3 (7127450) 11_r3 (7137996)
android-11.0.0_r1 RP1A.200720.009 R31P0_01EAC0 11_r3 (7127450) 11_r3 (7137996)
android-11.0.0_r6 RPM1.210413.002 R32P0_01EAC0 11_r4 (7352019) 11_r4 (7337463)
Android 10 Compatibility Testing was performed using the following:
Android Tag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0

Don't miss a new armnn release

NewReleases is sending notifications on new releases.