Announcements

For Execution Provider maintainers/owners: the lightweight compile API is now the default compiler API for all Execution Providers (this was previously only available for the mobile build). If you have an EP using the legacy compiler API, please migrate to the lightweight compile API as soon as possible. The legacy API will be deprecated in next release (ORT 1.13).
netstandard1.1 support is being deprecated in this release and will be removed in the next ORT 1.13 release

Key Updates

ONNX spec support
- onnx opset 17
- onnx-ml opset 3 (TreeEnsemble update)
BeamSearch operator for encoder-decoder transformers models
Support for invoking individual ops without the need to create a separate graph
- For use with custom op development to reuse ORT code
Support for feeding external initializers (for large models) as byte arrays for model inferencing
Build switch to disable usage of abseil library to remove dependency

Python 3.10 support
Mac M1 support in Python and Java packages
.NET 6/MAUI support in Nuget C# package
- Additional target frameworks: net6.0, net6.0-android, net6.0-ios, net6.0-macos
- NOTE: netstandard1.1 support is being deprecated in this release and will be removed in the 1.13 release
onnxruntime-openvino package available on Pypi (from Intel)

Improved C++ APIs that now utilize RAII for better memory management
Operator performance optimizations, including GatherElements
Memory optimizations to support compute-intensive real-time inferencing scenarios (e.g. audio inferencing scenarios)
- CPU usage savings for infrequent inference requests by reducing thread spinning
- Memory usage reduction through use of containers from the abseil library, especially inlined vectors used to store tensor shapes and inlined hash maps
New quantized kernels for weight symmetry to improve performance on ARM64 little core (GEMM and Conv)
Specialized kernel to improve performance of quantized Resize by up to 2x speedup
Improved the thread job partition for QLinearConv, demonstrating up to ~20% perf gain for certain models
Quantization tool: improved ONNX shape inference for large models

TensorRT EP
- TensorRT 8.4 support
- Provide option to share execution context memory between TensorRT subgraphs
- Workaround long CI test time caused by frequent initialization/de-initialization of TensorRT builder
- Improve subgraph partitioning and consolidate TensorRT subgraphs when possible
- Refactor engine cache serialization/deserialization logic
- Miscellaneous bug fixes and performance improvements
OpenVINO EP
- Pre-Built ONNXRuntime binaries with OpenVINO now available on pypi: onnxruntime-openvino
- Performance optimizations of existing supported models
- New runtime configuration option ‘enable_dynamic_shapes’ added to enable dynamic shapes for each iteration
- ORTModule included as part of OVEP Python Package to enable Torch ORT Inference
DirectML EP
- Updated to DirectML 1.9
- Opset 13-15 support: #11827, #11814, #11782, #11772
- Bug fixes: Xbox command list reuse, descriptor heap reset, command allocator memory growth, negative pad counts, node suffix removal
TVM EP - details
- Updated to add model .dll ingestion and execution on Windows
- Updated documentation and CI tests
[New] SNPE EP - details
[Preview] XNNPACK EP - initial infrastructure with limited operator support, for use with ORT Mobile and ORT Web
- Currently supports Conv and MaxPool, with work in progress to add more kernels

Binary size reductions in Android minimal build - 12% reduction in size of base build with no operator kernels
Added new operator support to NNAPI and CoreML EPs to improve ability to run super resolution and BERT models using NPU
- NNAPI: DepthToSpace, PRelu, Gather, Unsqueeze, Pad
- CoreML: DepthToSpace, PRelu
Added Docker file to simplify running a custom minimal build to create an ORT Android package
Initial XNNPACK EP compatibility

[New] ORT Training acceleration is also natively available through HuggingFace Optimum
[New] FusedAdam Optimizer now available through the torch-ort package for easier training integration
FP16_Optimizer Support for more DeepSpeed Versions
Bfloat16 support for AtenOp
Added gradient ops for ReduceMax and ReduceMin
Updates to Min and Max grad ops to use distributed logic
Optimizations
- Optimized perf for Gelu and GeluGrad kernels for mixed precision models
- Enabled fusions for SimplifiedLayerNorm
- Added bitmask versions of Dropout, BiasDropout and DropoutGrad which brings ~8x space savings for the mast output.

The Microsoft.ML.OnnxRuntime.DirectML package on Nuget has an issue and will be fixed in a patch. Fix: #12368
The Maven package has a packaging issue for Mac M1 builds and will be fixed in a patch. Fix: #12335 / Workaround discussion
Windows builds are not compatible with Windows 8.x in this release. Please use v1.11 for now.