Summary
The 21.05 Release of Arm NN was focused on providing new capabilities to allow users attain higher performance by:
- Making the Arm NN Core thread safe opening the possibility of running multiple inferences on the same model in parallel software threads.
- Allowing graphs on the GPU backend import their input and output buffers either from correctly aligned main memory or from kernel memory exposed as a dma_buf, thus reducing memory usage and saving the time involved in copying data into and out of the GPU memory space.
In addition to this, support was added to allow the MobileBERT network to be parsed and run.
Finally three deprecated components: the Tensorflow Parser, the Caffe Parser and the Arm NN Quantizer tool, were removed.
New Features
- CAST Operator support added on CpuRef, CpuAcc, GpuAcc Backends.
- Non-const weights support added on FULLY_CONNECTED layer for CpuRef Backend.
- Enable Input and Output Memory Import on GPU (Malloc and DmaBuf).
- Asynchronous Network Execution for CpuRef Backend.
- Optimisation added to fuse PAD into Pooling2d if possible.
- ASR sample application added to samples directory.
TfLite Parser
- ABS Operator Support added.
- ARG_MIN Operator Support added.
- CAST Operator Support added.
- LOGICAL_NOT Operator Support added.
- RSQRT Operator Support added.
- Non-const weights support added on FULLY_CONNECTED layer.
- Turn off Biases when data location is -1 (Added to support MobileBERT).
ArmNN Serializer/Deserializer
- Added Signed64 support to Serializer and Deserializer.
- Added QAsymmS8 support to Serializer.
- Added L2 Pooling algorithm to Deserializer.
ExecuteNetwork App Changes
- Asynchronous Network Execution support (Currently for CpuRef Backend).
- Re-enabled GPU profiling in ExecuteNetwork.
Deprecated features
- Deprecated the Caffe Parser.
- Deprecated the Tensorflow Parser.
- Deprecated the Arm NN Quantizer tool.
- Deprecated m_Output_Type from the ArgMinMaxDescriptor: the output type is solely determined by the data type of the output tensor.
Bug Fixes
- Fix CheckProfilingObjectUids test failing on Ubuntu 21.04.
- Fix added to Serializer to handle situations where a shape has some unspecified dimensions.
- Fix added to AddBroadcastReshapeLayer optimisation to prevent modification to constant layers with multiple connections.
- Fix added to use CMake value ${CMAKE_THREAD_LIBS_INIT} throughout instead of 'pthread'.
- Fix added to handle negative axis correctly in ARG_MAX (TfLiteParser) and SPLIT (TfLiteParser & TfLiteDelegate) operators.
- Fixed TfLiteDelegate Normalization & Softmax for Android if NDK is less than r21.
- Fixed Deserializer issue where layer bindings were incorrectly assigning the tensor info of one output to all 4 outputs.
- Fixed x86_64 ArmNN DockerFile.
- Fixed TuningLevel enumeration values to be consistent.
- Fixed YoloV3 test application's incorrect use of std::abs.
- Improved performance on SqueezeNet v1.1.
Other Changes
- Removed cross-wiring in DepthwiseConvolution2d. The permutation of the full tensor info is now performed in armnnUtils::Permuted.
- Moved doctest third-party library to armnn from delegate.
- Updated TfLiteDelegate Python Integration guide with new links. Also added information about the TFLite Model Benchmark Tool.
- Updated Cross Compiling Guide.
- Improved Graph memory usage.
Known Issues
- Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
- There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 25.0.0 while also bumping our Parsers and Delegate to 24.1.0 following Semantic Versioning guidelines.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Add Async Queue to IRuntime | e813d67 | https://review.mlplatform.org/c/ml/armnn/+/5493 |
|
Add front-end support for CAST + Add TfLiteParser support for CAST | b392e98 | https://review.mlplatform.org/c/ml/armnn/+/5374 |
|
Add MemorySourceFlags to TensorHandleFactoryRegistry::GetFactory | 73d3e2e | https://review.mlplatform.org/c/ml/armnn/+/5481 |
|
Move ILayerSupport.hpp to backends folder | cae4568 | https://review.mlplatform.org/c/ml/armnn/+/5500 |
|
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator | f0a6dec | https://review.mlplatform.org/c/ml/armnn/+/5180 |
|
Refactor Async Network API | 55a8ffd | https://review.mlplatform.org/c/ml/armnn/+/5365 |
|
Remove cross-wiring in depthwise | 7612bd6 | https://review.mlplatform.org/c/ml/armnn/+/5411 |
|
Remove Quantizer | 4a621c4 | https://review.mlplatform.org/c/ml/armnn/+/5486 |
|
The following back-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator | 16fb1a2 | https://review.mlplatform.org/c/ml/armnn/+/5180 |
|
Move ILayerSupport.hpp to backends folder | cae4568 | https://review.mlplatform.org/c/ml/armnn/+/5500 |
|
Generalise ConstCpuTensorHandle | 1f58f03 | https://review.mlplatform.org/c/ml/armnn/+/5515 |
|
Enable import on GPU | e5f0b24 | https://review.mlplatform.org/c/ml/armnn/+/5605 |
|
TfLite Delegate
New features
- Non-const weights support added on FULLY_CONNECTED layer
- CAST operator support
- PACK operator support
- UNPACK operator support
- Added program options to armnn_external_delegate.cpp
- enable-fast-math
- number-of-threads
- save-cached-networks
- cached-network-filepath
- Signed64 support added
Bug Fixes
- Fix added to set the correct index for connecting constant layers.
- Fix added to handle negative axis correctly in SPLIT operator.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
CMake | 3.7.2 or later |
boost | 1.64 |
Tensorflow | 2.3.1 |
Onnx | 1.6.0 |
Flatbuffer | 1.12.0 |
Protobuf | 3.12.0 |
Android NDK | r20b |
mapbox/variant | 1.2.0 |
Android 11 Compatibility Testing was performed using the following
Android Tag | Android Build ID | Mali Driver | Android Compatibility Test Suite | Android Vendor Test Suite |
---|---|---|---|---|
android-11.0.0_r1 | RP1A.200720.009 | R30P0_01EAC0 | 11_r3 (7127450) | 11_r3 (7137996) |
android-11.0.0_r1 | RP1A.200720.009 | R31P0_01EAC0 | 11_r3 (7127450) | 11_r3 (7137996) |
android-11.0.0_r6 | RPM1.210413.002 | R32P0_01EAC0 | 11_r4 (7352019) | 11_r4 (7337463) |
Android 10 Compatibility Testing was performed using the following:
Android Tag | Android Build ID | Mali Driver |
---|---|---|
android-10.0.0_r39 | QQ3A.200605.002.A1 | R23P0_01REL0 |