Summary
Arm NN 21.08 was focused on providing new capabilities and improve performance::
- Added the ability to import protected DMA Buffers and allow Arm NN to run inferences that are in Protected GPU Memory. As well as providing Custom Memory Allocator which supports importing malloc, Dma_buf and protected Dma buffers.
- Users with multi core NPUs has been given the ability to pin inferences to selected cores giving them the ability to balance parallel workloads across the NPU and increase throughput.
- Boost has been completely removed from the code base making Arm NN easier to integrate into other software stacks.
- Added support for non-constant weights and biases on FullyConnected which lay the groundwork for supporting more models.
- More operators supported on Arm NN, TfLite Parser, TfLite Delegate and Android NNAPI driver.
New Features
- Moved unit tests from BOOST to doctest.
- UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added on CpuRef backend.
- Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
- Reduce Operator can now support multiple axes.
- Optimisation added to fuse PAD Operator into Depthwise Convolution Operator.
- Added SIN and LOG support to ElementWiseUnary Operator on CpuRef, CpuAcc (Only LOG is supported) and GpuAcc backends.
- Added SHAPE Operator support on CpuRef backend.
- Moved useful test utilities to new static library (libarmnnTestUtils.a).
- Added ability to create multiple LoadedNetworks from one OptimizedNetwork.
- Arm NN TfLite Delegate Image Classification sample application added to samples directory.
- Added fully comprehensive Arm NN Operator list page to Doxygen.
- Added support to allow Arm NN to run inferences that are in Protected GPU Memory.
- Creation of Protected Memory is handled via a Custom Memory Allocator which supports importing malloc, Dma_buf and protected DMA buffers.
TfLite Parser
- EXPAND_DIMS Operator support added.
- PRELU Operator support added.
- SHAPE Operator support added.
- Comparison Operator support added (EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL and NOT_EQUAL).
- Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
- Added support for shape_signature, which will now be the preferred way to detect dynamic tensors.
- If creating an instance of the ITfLiteParser and the model used is dynamic, then please ensure that m_InferAndValidate is set in the TfLiteParserOptions and m_shapeInferenceMethod is set to InferAndValidate in the OptimizerOptions.
ArmNN Serializer/Deserializer
- Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
- Added SIN and LOG support to ElementWiseUnary Operator.
- UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added.
ExecuteNetwork App Changes
- Added option to specify what size Arm NN thread pool to use when running inferences asynchronously.
- Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8.
- Added option to specify different input data for every iteration of ExecuteNetwork.
- Added option to print additional information such as the TensorInfo, Descriptor and Convolution method when profiling is enabled.
NOTE: To run dynamic models through ExecuteNetwork the --infer-output-shape flag should be set.
Bug Fixes
- Removed duplicate check for Dequantize input type when checking if operator is supported.
- Fixed undefined behaviour in PolymorphicDowncast.
- Fixed binding of reference to null pointer in RefFullyConnectedWorkload.
- Fixed PermutationVector.end() to cope with dimensions < 5 in PermutationVector class.
- Fixed cl_ext.h include path in CL backend.
- Fixed bugs in PreCompiledLayer. E.g. A new shared_ptr was being created instead of allowing std::move to convert the unique_ptr into a shared_ptr.
- Fixed gcc 9.3.0 compiler warning in TfLiteParser.
- Fixed issue so that the BackendRegistry is cleaned up correctly following negative tests.
Other Changes
- Print Elementwise and Comparison Operator descriptors in a dot graph.
- Added IsConstant flag to TensorInfo. This should be set if using the new AddFullyConnectedLayer Graph API when weights and bias are constant. An example of this can be found in samples/SimpleSample.cpp.
- Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8 to ImageTensorGenerator.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 26.0.0 while also bumping our Parsers and Delegate to 24.2.0 following Semantic Versioning guidelines.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Rework the async threadpool | f364d53 | https://review.mlplatform.org/c/ml/armnn/+/5801 |
struct INetworkProperties: Field m_NumThreads has been removed from the middle position of this structural type. Size of this type has been changed from 32 bytes to 24 bytes. class IWorkingMemHandle: Pure virtual method GetInferenceId ( ) has been removed from this class. class IAsyncExecutionCallback: The following methods have been removed: |
Add IsConstant flag to TensorInfo | b082ed0 | https://review.mlplatform.org/c/ml/armnn/+/5842 |
An object of this class can be allocated by applications which the old size will be hardcoded at original compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. struct BindingPointInfo: Size of field m_TensorInfo has been changed from 80 bytes to 88 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. |
Add protected mode to ArmNN CreationOptions | 15fcc7e | https://review.mlplatform.org/c/ml/armnn/+/5963 |
|
Add the Custom Memory Allocator interface definition | 801e2d5 | https://review.mlplatform.org/c/ml/armnn/+/5967 |
|
Add front end support for UnidirectionalSequenceLstm on ArmNN | 8ed39ae | https://review.mlplatform.org/c/ml/armnn/+/5956 |
|
JSON profiling output | 554fa09 | https://review.mlplatform.org/c/ml/armnn/+/5968 |
|
ConstTensorsAsInput: FullyConnected | 81beae3 | https://review.mlplatform.org/c/ml/armnn/+/5942 |
|
Adds CustomAllocator interface and Sample App | c1c872f | https://review.mlplatform.org/c/ml/armnn/+/5987 |
class BackendRegistry: Field m_CustomMemoryAllocatorMap has been added to this type. Size of this type has been changed from 80 bytes to 136 bytes. |
Allow profiling details to be switched off during profiling | f487486 | https://review.mlplatform.org/c/ml/armnn/+/6069 |
|
The following back-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Refactor the reporting of capabilities from backends | b9af86e | https://review.mlplatform.org/c/ml/armnn/+/5728 |
The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. |
Add protected mode to ArmNN CreationOptions | 15fcc7e | https://review.mlplatform.org/c/ml/armnn/+/5963 |
The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. |
TfLite Delegate
New features
- PRELU Operator Support added.
- SHAPE Operator support added.
- Added Asynchronous Network Execution.
- Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
Tensorflow | 2.3.1 |
Onnx | 1.6.0 |
Flatbuffer | 1.12.0 |
Protobuf | 3.12.0 |
Android NDK | r20b |
mapbox/variant | 1.2.0 |
Android 12 Compatibility Testing was performed using the following:
Android Tag | Android Build ID | Mali Driver | Android Compatibility Test Suite | Android Vendor Test Suite |
---|---|---|---|---|
android-12 | SP1A.210812.003 | r32p1_01eac0 | 12_r1 (eng.upr473.20210901.005349)1 | 12_r1 (eng.upr473.20210901.024841) |
1: CtsNNAPITestCases with Mali Driver r32p1_01eac0. The following test is known to be failing: AddTwoWithHardwareBufferInputWithGPUUsage. Investigations indicate this failure is due to Android NN HAL utilizing Gralloc functionality not required by the Gralloc API. This issue has been raised with Google Android team, and is tracked as https://partnerissuetracker.corp.google.com/issues/202025253. Please quote Arm reference MIDCET-3783 when discussing this issue.
Android 11 Compatibility Testing was performed using the following:
Android Tag | Android Build ID | Mali Driver | Android Compatibility Test Suite | Android Vendor Test Suite |
---|---|---|---|---|
android-11.0.0_r1 | RP1A.200720.009 | r31p0_01eac0 | 11_r4 (7352019) | 11_r4(7337463) |
android-11.0.0_r6 | RPM1.210413.002 | r32p0_01eac0 | 11_r4 (7352019) | 11_r4 (7337463) |
android-11.0.0_r6 | RPM1.210413.002 | r33p0_01eac0 | 11_r4 (7352019) | 11_r4 (7337463) |
Android 10 Compatibility Testing was performed using the following:
Androidtag | Android Build ID | Mali Driver |
---|---|---|
android-10.0.0_r39 | QQ3A.200605.002.A1 | R23P0_01REL0 |