Key Updates
General
- Support for ONNX 1.11 with opset 16
- Updated protobuf version to 3.18.x
- Enable usage of Mimalloc (details)
- Transformer model helper scripts
- On Windows, error strings in OrtStatus are now encoded in UTF-8. When you need to print it out to screen, first convert it to a wide char string by using the MultiByteToWideChar Windows API.
Performance
- Memory utilization related performance improvements (e.g. elimination of vectors for small dims)
- Performance variance stability improvement through dynamic cost model session option (details)
- New quantization data format support: S8S8 in QDQ format
- Added s8s8 kernels for ARM64
- Support to convert s8s8 to u8s8 automatically for x64
- Improved performance on ARM64 for quantized CNN model through:
- New kernels for quantized depthwise Conv
- Improved symmetrically quantized Conv by leveraging indirect buffer
- New Gemm kernels for symmetric quantized Conv and MatMul
- General quantization improvements, including new quantized operators (Resize, ArgMax) and quantization tool updates
API
- Java: Only a single OrtEnv can be created in any given execution of the JVM. Previously, the environment could be closed completely and a fresh one could be created with different parameters (e.g. global thread pool, or logging level) (details)
Packages
- Nuget packages
- C# packages now tested with .NET 5. .NET Core 2.1 support is deprecated as it has reached end of life support on August 21, 2021. We will closely follow .NET's support policy
- Removed PDB files. These are attached as release artifacts below.
- Pypi packages
- Python 3.6 is deprecated as it has reached EOL December 2021. Supported Python versions: 3.7-3.9
- Note: Mac M1 builds are not yet available in pypi but can be built from source
- OnnxRuntime with OpenVINO support available at https://pypi.org/project/onnxruntime-openvino/1.11.0/
Execution Providers
- CUDA
- Enable CUDA provider option configuration for C# to support workspace size configuration from and fix binary compatibility of CUDAProviderOptions C API
- Preview support for CUDA Graphs (details)
- TensorRT
- TRT 8.2.3 support
- Memory footprint optimizations
- Support protobuf >= 3.11
- Updated flatbuffers version to 2.0
- Misc Bug Fixes
- DirectML
- Updated more operators to opset 13 (QuantizeLinear, DequantizeLinear, ReduceSum, Split, Squeeze, Unsqueeze, ReduceSum).
- OpenVINO
- OpenVINO™ version upgraded to 2022.1.0 - biggest OpenVINO™ upgrade in 3.5 years. This provides functional bug fixes, API Change 2.0 and capability changes from the previous 2021.4.2 LTS release.
- Performance Optimizations of existing supported models.
- Pre-Built OnnxRuntime Binaries with OpenVINO enabled can be downloaded from https://github.com/intel/onnxruntime/releases/tag/v4.0
https://pypi.org/project/onnxruntime-openvino/1.11.0/
- OpenCL (in preview)
- Introduced the EP for OpenCL to use with Mobile GPUs
- Available in
experimental/opencl
branch for users to try. Provide feedback through Issues and Discussions in the repo. - README is available here.
Mobile
- Added general support for converting a model to NHWC layout at runtime
- Execution provider sets preferred layout and shared infrastructure in ORT will ensure the nodes the execution provider is assigned will be in that layout
- Added support for runtime optimization with minimal binary size impact
- Relevant optimizations are saved in the ORT format model for replay at runtime if applicable
- Added support for QDQ format models to the NNAPI EP
- Will fall back to CPU EP’s QDQ handling if NNAPI is not available using runtime optimizations
- Includes updates to the ORT QDQ optimizers so they work better with mobile scenarios
- Added helpers to:
- Analyze if a model can be used with the pre-built ORT Mobile package
- Update ONNX opset so model can be used with the pre-built package
- Convert dynamic inputs into fixed size inputs so that the model can be used with NNAPI/CoreML
- Optimize a QDQ format model for use with ORT
- Added Android and iOS packages with full ORT builds
- These packages have additional support for the full set of opsets and ops for ONNX models at the cost of a larger binary size.
Web
- Build option to create ONNX Runtime WebAssembly static library
- Support for concurrent creation of multiple inference sessions
- Upgraded emsdk version to 3.1.3 for more stable multi-threads and enables LTO with multi-threads build on WebAssembly.
Known issues
- When using tensor sequences/sparse tensors, the generated profile is not valid JSON. (Fixed in #10974)
- There is a bug in the quantization tool for calibration when choosing percentile algorithm (Fixed in #10940). To fix this, please apply the typo fix in the python file.
- Mac M
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
snnn, edgchen1, skottmckay, yufenglee, wangyems, yuslepukhin, gwang-msft, iK1D, chilo-ms, fdwr, ytaous, RandySheriffH, hanbitmyths, chenfucn, yihonglyu, ajindal1, fs-eire, souptc, tianleiwu, YUNQIUGUO, hariharans29, oliviajain, xadupre, ashari4, RyanUnderhill, jywu-msft, weixingzhang, baijumeswani, georgen117, natke, Craigacp, jeffdaily, JingqiaoFu, zhanghuanrong, satyajandhyala, smk2007, ryanlai2, askhade, thiagocrepaldi, jingyanwangms, pengwa, scxiao, ashbhandare, BowenBao, SherlockNoMad, sumitsays, sfatimar, mosdav, harshithapv, liqunfu, tiagoshibata, gineshidalgo99, pranavsharma, jcwchen, nkreeger, xkszltl, faxu, suffiank, stevenlix, jeffbloo, feihugis