Summary

The 21.02 Release provides two major pieces of functionality: one performance related, namely the ability to cache compiled OpenCL kernels when running on the GPU backend. Cached kernel files can be loaded into the runtime eliminating the cost of compiling their associated graphs resulting in significant performance uplift on first execution of a newly loaded graph. The second is that the operators which were not added to the Arm NN Tensorflow Lite delegate in the 20.11 release are now there giving the delegate the same level of operator support as the android-nn-driver.

The other features of the 21.02 release are updating the Tensorflow Lite parser to work with Tensorflow Lite v2.3.1 and changes to the public APIs to make binary compatibility between releases easier to maintain. Each group of public interfaces SDK, backend, TfLiteDelegate etc. have been separately versioned and will have their version independently updated in subsequent releases to indicate changes in their Application Binary Interface (ABI).

Support has also been added for the SSD-MobileNetv2 and SSD-MobileNetv3 models. The models have been verified to execute correctly with good performance. Work to generate accuracy figures for the models using the tensorflow lite coco_object_detection tool is on-going and will be published when complete.

Two configuration options for the CpuAcc backend have been added one to specify the number of threads to use when executing ML workloads on the CPU the other to load an MLGO tuning file to increase the performance of GEMM operations on the CPU.

New Features:

Added ability to save and load the ClContext through ExecuteNetwork and the Android-nn-driver.
- This will remove the time taken for initial compilation of OpenCL kernels and speed up the first execution.
Semantic Versioning for ArmNN APIs.
Arm NN TfLite Delegate (more extensive details in Arm NN TfLite Delegate section)
- Further operator support.
- Add capability to build on Android.
Verification of Support of SSD-MobileNetv2 & SSD-MobileNetv2.

TfLite Parser

Added support for ELU activation.
Support Dilation in Conv2D.

ONNX Parser

Support Dilation in Conv2D.

Caffe Parser

Added Dilation support.
Added argmax deconv support.

ArmNN Serializer

Serialise ArmNN Model on android-nn-driver.

Public API Changes:

Backend API Changes:

ExecuteNetwork App Changes:

Two optimization parameters were added to enable saving and loading of the ClContext.
- save-cached-network
- cached-network-filepath

Other changes:

Make it easier for backends to traverse the subgraph during optimization by sorting Subgraphview layers on construction.
Added CL/NEON implementation of RANK Workload.
Added REDUCE layer for REDUCE_MAX, REDUCE_MIN, REDUCE_SUM operators.
Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support CpuRef Backend.
Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload CpuAcc Backend.
Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload GpuAcc Backend.
Added more Fused Activation unit tests.
Handle Neon optionality on 32 bit linux platforms.
Validated MobileNetv2-SSD and MobileNetv3-SSD support.
Add CpuAcc specific configuration option numberOfThreads.
Add GpuAcc MLGO tuning file configuration argument.

Bug Fixes:

Default stride values in depthwise and convolution to 1 instead of 0.
Fixed transpose conv InferOutputShape.
Fix incorrect padding value for asymmetric quantized type.
Fix build breaks for armnnDeserializer test and Threads.cpp for macosx.
- Further fix for macosx where filenames are case insensitive.
Unittest failure on mipsel/s390x/ppc64/powerpc.
ArmnnQuantizer incorrectly Quantizes all data types.
Fixed TFLite parser not parsing TransposeConvolution.
Fix TfLite parser and ExecuteNetwork issues where error was not thrown in some cases.
Fix wav2letter not producing correct output for Neon backend.
Fix ReduceLayer InferOutputShape issue where the correct axis data will be read in TfLiteParser.
Fix Reduce workload to allow input tensors of any rank into the validate function.
Updated JsonPrinterTestImpl to use CpuLogitsDLogSoftmaxKernel_#.
Add missing serializer support for m_DimensionsSpecificity.
Removed unnecessary friend function in INetwork and fixed TransformIterator operator= to allow compilation on further compilers.

Known issues:

Deprecation Notification:

The following components have been deprecated and will be removed in the next release (21.05) of Arm NN.

armnnQuantizer :
Now that the Tensorflow Lite Converter has matured post training quantization capabilities, the need for this component has gone. See: https://www.tensorflow.org/model_optimization/guide/quantization/post_training and https://www.tensorflow.org/lite/performance/post_training_quantization for more details.
armnnTfParser :
As Tensorflow Lite is our current recommended deployment environment for Arm NN and the Tensorflow Lite Converter provides a path for converting most common machine learning models into Tensorflow Lite format, the need for a Tensorflow parser has gone.
armnnCaffeParser :
Caffe is no longer as widely used as a framework for machine learning as it once was.

Ubuntu 16.04 LTS is reaching End of Life.

Ubuntu Linux 16.04 LTS will no longer be supported by April 30, 2021.
At that time, Ubuntu 16.04 LTS will no longer receive security patches or other software updates.
Consequently Arm NN will from the 21.08 Release at the end of August 2021 no longer be officially supported on Ubuntu 16.04 LTS but will instead be supported on Ubuntu 18.04 LTS.

TfLite Delegate

New Features:

Enabled ELU Activation.
Enabled HARD_SWISH Activation.
Added GATHER operator support.
Added Logical AND, NOT and OR operator support.
Added PAD operator support.
Added PADV2 operator support.
Added SPLIT operator support.
Added SPLIT_V operator support.
Added ARG_MAX operator support.
Added ARG_MIN operator support.
Added LOCAL_RESPONSE_NORMALIZATION operator support.
Added L2_NORMALIZATION operator support.
Added BATCH_TO_SPACE_ND operator support.
Added SPACE_TO_BATCH_ND operator support.
Added DEPTH_TO_SPACE operator support.
Added SPACE_TO_DEPTH operator support.
Added SUM operator support.
Added REDUCE_MAX, REDUCE_MIN operator support.
Added FLOOR operator support.
Added OptimizerOptions
- Reduce Float32 to Float16.
- Reduce Float32 to BFloat16.
- Enable debug data.
- Enable memory import.
Added STRIDED_SLICE operator support.
Added LSTM operator support.

Other Changes:

Provided Android build.
Removed Tensorflow requirement.

Bug Fixes:

Fixed fused activation in Fully Connected layer.
Fixed TfLiteDelegate Reshape operator failure when running models with 2D shape tensor.

Known Issues:

Note: We have added pre-built binaries (please see the Assets) of 21.02 Arm NN along with this release. Please refer to BuildGuideNative.md guide in the armnn/delegate for more information.

Build dependencies:

Tools	Supported Version
Git	2.17.1 or later
Scons	2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake	3.5.1 (Ubuntu) and 3.7.2 (Debian)
Boost	1.64
Tensorflow	2.3.1
Caffe	tag 1.0
Flatbuffer	1.12.0
Protobuf	3.12.0
Eigen3	3.3
Android NDK	r20b
mapbox/variant	1.2.0

Android 11 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver	Android Compatibility Test Suite	Android Vendor Test Suite
android-11.0.0_r1	RP1A.200720.009	R26P0_01EAC0, R30P0_01EAC0	11_r2 (6965179)	11_r2 (6961477)

Android 10 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver
android-10.0.0_r39	QQ3A.200605.002.A1	R23P0_01REL0

Note: Going forward Arm NN will be making document updates to the latest release, if we have missed any, and these will be available in github by selecting the doc tag corresponding to the release. For example, we have tag 21.02.doc1 which basically is the 21.02 release and also includes some of the documents which we updated for the 21.02 Release. There are no changes functionality wise. These document changes are cherry picked to the branches/armnn_21_02.

ARM-software/armnn v21.02 Release 21.02 on GitHub

Summary

New Features:

TfLite Parser

ONNX Parser

Caffe Parser

ArmNN Serializer

Public API Changes:

Backend API Changes:

ExecuteNetwork App Changes:

Other changes:

Bug Fixes:

Known issues:

Deprecation Notification:

TfLite Delegate

New Features:

Other Changes:

Bug Fixes:

Known Issues:

Build dependencies:

Android 11 Compatibility Testing was performed using the following:

Android 10 Compatibility Testing was performed using the following:

ARM-software/armnn v21.02
Release 21.02

on GitHub