pykeio/ort v2.0.0-rc.10 on GitHub

💖 If you find `ort` useful, please consider sponsoring us on Open Collective 💖

🤔 Need help upgrading? Ask questions in GitHub Discussions or in the pyke.io Discord server!

🔗 Tensor Array Views

You can now create a TensorRef directly from an ArrayView. Previously, tensors could only be created via Tensor::from_array (which, in many cases, performed a copy if borrowed data was provided). The new TensorRef::from_array_view (and the complementary TensorRefMut::from_array_view_mut) method(s) allows for the zero-copy creation of tensors directly from an ArrayView.

Tensor::from_array now only accepts owned data, so you should either refactor your code to use TensorRefs or pass ownership of the array to the Tensor.

⚠️ ndarrays must be in standard/contiguous memory layout to be converted to a TensorRef(Mut); see .as_standard_layout().

↔️ Copy Tensors

rc.10 now allows you to manually copy tensors between devices using Tensor::to!

// Create our tensor in CUDA memory
let cuda_allocator = Allocator::new(
	&session,
	MemoryInfo::new(AllocationDevice::CUDA, 0, AllocatorType::Device, MemoryType::Default)?
)?;
let cuda_tensor = Tensor::<f32>::new(&cuda_allocator, [1_usize, 3, 224, 224])?;

// Copy it back to CPU
let cpu_tensor = cuda_tensor.to(AllocationDevice::CPU, 0)?;

There's also Tensor::to_async, which replicates the functionality of PyTorch's non_blocking=True. Additionally, Tensors now implement Clone.

⚙️ Alternative Backends

ort is no longer just a wrapper for ONNX Runtime; it's a one-stop shop for inferencing ONNX models in Rust thanks to the addition of the alternative backend API.

Alternative backends wrap other inference engines behind ONNX Runtime's API, which can simply be dropped in and used in ort - all it takes is one line of code:

fn main() {
    ort::set_api(ort_tract::api()); // <- magic!

    let session = Session::builder()?
        ...
}

2 alternative backends are shipping alongside rc.10 - ort-tract, powered by tract, and ort-candle, powered by candle, with more to come in the future.

Outside of the Rust ecosystem, these alternative backends can also be compiled as standalone libraries that can be directly dropped in to applications as a replacement for libonnxruntime. 🦀🦠

✏️ Model Editor

Models can be created entirely programmatically, or edited from an existing ONNX model via the new Model Editor API.

See src/editor/tests.rs for an example of how an ONNX model can be created programmatically. You can combine the Model Editor API with SessionBuilder::with_optimized_model_path to export the model outside Rust.

⚛️ Compiler

Many execution providers internally convert ONNX graphs to a framework-specific graph representation, like CoreML networks/TensorRT engines. This process can take a long time, especially for larger and more complex models. Since these generated artifacts aren't persisted between runs, they have to be created every time a session is loaded.

The new Compiler API allows you to compile an optimized, EP-ready graph ahead-of-time, so subsequent loads are lighting fast! ⚡

ModelCompiler::new(
    Session::builder()?
        .with_execution_providers([
            TensorRTExecutionProvider::default().build()
        ])?
)?
    .with_model_from_file("model.onnx")?
    .compile_to_file("compiled_trt_model.onnx")?;

🪶 `#![no_std]`

🚨 BREAKING: If you previously used ort with default-features = false...

That will now disable ort's std feature, which means you don't get to use APIs that interact with the operating system, like SessionBuilder::commit_from_file - APIs you probably need!
To minimize breakage, manually enable the std feature:
[dependencies]
ort = { version = "=2.0.0-rc.10", default-features = false, features = [ "std", ... ] }

ort no longer depends on std (but does still depend on alloc) - default-features = false will enable #![no_std] for ort.

⚡ Execution Providers

🚨 BREAKING: Boolean options for ArmNN, CANN, CoreML, CPU, CUDA, MIGraphX, NNAPI, OpenVINO, & ROCm...

If you previously used an option setter on one of these EPs that took no parameters (i.e. a boolean option that was false by default), note that these functions now do take a boolean parameter to align with Rust idiom.
Migrating is as simple as passing true to these functions. Affected functions include:

ArmNNExecutionProvider::with_arena_allocator
CANNExecutionProvider::with_dump_graphs
CPUExecutionProvider::with_arena_allocator
CUDAExecutionProvider::with_cuda_graph
CUDAExecutionProvider::with_skip_layer_norm_strict_mode
CUDAExecutionProvider::with_prefer_nhwc
MIGraphXExecutionProvider::with_fp16
MIGraphXExecutionProvider::with_int8
NNAPIExecutionProvider::with_fp16
NNAPIExecutionProvider::with_nchw
NNAPIExecutionProvider::with_disable_cpu
NNAPIExecutionProvider::with_cpu_only
OpenVINOExecutionProvider::with_opencl_throttling
OpenVINOExecutionProvider::with_dynamic_shapes
OpenVINOExecutionProvider::with_npu_fast_compile
ROCmExecutionProvider::with_exhaustive_conv_search

🚨 BREAKING: Renamed enum options for CANN, CUDA, QNN...

The following EP option enums have been renamed to reduce verbosity:

CANNExecutionProviderPrecisionMode -> CANNPrecisionMode
CANNExecutionProviderImplementationMode -> CANNImplementationMode
CUDAExecutionProviderAttentionBackend -> CUDAAttentionBackend
CUDAExecutionProviderCuDNNConvAlgoSearch -> CuDNNConvAlgorithmSearch
QNNExecutionProviderPerformanceMode -> QNNPerformanceMode
QNNExecutionProviderProfilingLevel -> QNNProfilingLevel
QNNExecutionProviderContextPriority -> QNNContextPriority

🚨 BREAKING: Updated CoreML options...

CoreMLExecutionProvider has been updated to use a new registration API, unlocking more options. To migrate old options:

.with_cpu_only() -> .with_compute_units(CoreMLComputeUnits::CPUOnly)
.with_ane_only() -> .with_compute_units(CoreMLComputeUnits::CPUAndNeuralEngine)
.with_subgraphs() -> .with_subgraphs(true)

rc.10 adds support for 3 execution providers:

Azure allows you to call Azure AI models like GPT-4 directly from ort.
WebGPU is powered by Dawn, an implementation of the WebGPU standard, allowing accelerated inference with almost any D3D12/Metal/Vulkan/OpenGL-supported GPU. Binaries with the WebGPU EP are available on Windows & Linux, so you can start testing it straight away!
NV TensorRT RTX is a new execution provider purpose-built for NVIDIA RTX GPUs running with ONNX Runtime on Windows. It's powered by TensorRT for RTX, a specially-optimized inference library built upon TensorRT releasing in June.

All binaries are now statically linked! This means the cuda and tensorrt features no longer use onnxruntime.dll/libonnxruntime.so. The EPs themselves do still require separate DLLs - like libonnxruntime_providers_cuda - but this change should make it significantly easier to set up and use ort with CUDA/TRT.

🧩 Custom Operator Improvements

🚨 BREAKING: Migrating your custom operators...

All methods under Operator now take &self.
The operator's kernel is no longer an associated type - create_kernel is instead expected to return a Box<dyn Kernel> (which can now be created directly from a function!)

 impl Operator for MyCustomOp {
-    type Kernel = MyCustomOpKernel;
 
-    fn name() -> &'static str {
+    fn name(&self) -> &str {
         "MyCustomOp"
     }
 
-    fn inputs() -> Vec<OperatorInput> {
+    fn inputs(&self) -> Vec<OperatorInput> {
         vec![OperatorInput::required(TensorElementType::Float32)]
     }
 
-    fn outputs() -> Vec<OperatorOutput> {
+    fn outputs(&self) -> Vec<OperatorOutput> {
         vec![OperatorOutput::required(TensorElementType::Float32)]
     }
 
-   fn create_kernel(_: &KernelAttributes) -> ort::Result<Self::Kernel> {
-       Ok(MyCustomOpKernel)
-   }
+   fn create_kernel(&self, _: &KernelAttributes) -> ort::Result<Box<dyn Kernel>> {
+       Ok(Box::new(|ctx: &KernelContext| {
+           ...
+       }))
+   }
 }

To add an operator to an OperatorDomain, you now pass the operator by value instead of as a type parameter:

 let mut domain = OperatorDomain::new("io.pyke")?;
-domain = domain.add::<MyCustomOp>()?;
+domain = domain.add(MyCustomOp)?;

Custom operators have been internally revamped to reduce code size & compilation time, and allow operators to be Sized.

🔷 Miscellaneous changes

Updated to ONNX Runtime v1.22.0.
The minimum supported Rust version (MSRV) is now 1.81.0.
The tracing dependency is now optional (but enabled by default).
- To keep using tracing with default-features = false, enable the tracing feature.
- When disabled, ONNX Runtime will log its messages directly to stdout. The log level defaults to WARN but can be controlled at runtime via the ORT_LOG environment variable by setting it to one of verbose, info, warning, error, or fatal.
The domain serving prebuilt binaries has moved from parcel.pyke.io to cdn.pyke.io, so make sure to update firewall exclusions.
The build.rs hack for Apple platforms is no longer required. (9b31680)
The ureq dependency (used by download-binaries/fetch-models) has been ugpraded to v3.0.
- ort with the fetch-models feature will use rustls as the TLS provider.
- ort-sys with the download-binaries feature will use native-tls since that pulls less dependencies (it previously used rustls). No prerequisites are required when building on Windows & macOS, but other platforms now require OpenSSL to be installed.
All ONNX Runtime tensor types are now supported - including Complex64 & Complex128, 4-bit integers, and 8 bit floats!
- Tensors of these types cannot be created from an array or extracted since they don't have de facto Rust equivalents, but you can use DynTensor::new to allocate a tensor and DynTensor::data_ptr to access its data.
Reduce allocations (e136869)
- Session::run can now be zero-alloc (on the Rust side)!
Prebuilt binaries are now powered by KleidiAI on ARM64 - this should make them a fair bit faster!

⚠️ Breaking

🚨 Session::run now takes &mut self.
- 💡 Tip when using mutexes: You can use SessionOutputs::remove to get an owned session output.
🚨 ort::inputs! no longer outputs a Result, so remove the trailing ? from any invocations of the macro.
🚨 extract_tensor to extract a tensor to an ndarray has been renamed to extract_array, with extract_raw_tensor now taking the place of extract_tensor.
- DynValue::try_extract_tensor(_mut) -> DynValue::try_extract_array(_mut)
- Tensor::extract_tensor(_mut) -> Tensor::extract_array(_mut)
- DynValue::try_extract_raw_tensor(_mut) -> DynValue::try_extract_tensor(_mut)
- Tensor::extract_raw_tensor(_mut) -> Tensor::extract_tensor(_mut)
Session::run_async now always takes &RunOptions; Session::run_async_with_options has been removed.
Most instances of "dimensions" (i.e. in ValueType::tensor_dimensions) has been replaced with "shape" (so ValueType::tensor_shape) for consistency.
Tensor shapes now use a custom struct, ort::tensor::Shape, instead of a Vec<i64> directly.
- Similarly, ValueType::Tensor.dimension_symbols is its own struct, SymbolicDimensions.
- Both can be converted from their prior forms via ::from()/.into().
SessionBuilder::with_execution_providers now takes AsRef<[EP]> instead of any iterable type.
SessionBuilder::with_external_initializer_file_in_memory requires a Path for the path parameter instead of a regular &str.

🪲 Fixes

Zero out tensors created on the CPU via Tensor::new. (7a95f98)
- In some cases, the memory allocated by ONNX Runtime for new tensors was not initally zeroed. Now, any tensors created in CPU-accessible memory via Tensor::new will be manually zeroed on the Rust side.
IoBinding::synchronize_* now takes &self so synchronize_outputs can actually be used as intended (e8d873a)
Fix XNNPACKExecutionProvider::is_available always returning false (5ad997c)
Fix a memory lifetime issue with AllocationDevice & MemoryInfo (3ca14c2)
Fix OpenVINO EP registration failures by ensuring an environment is available (3e7e8fe)
- and use the new registration API for OpenVINO (5661450)
ort-sys crate now specifies links, hopefully preventing linking conflicts (d2dc7c8)
Correct the internal device name for the DirectML AllocationDevice (46c3376)
ort-sys no longer tries to download binaries when building with --offline (d7d4493)
Dylib symlinks are now properly renewed when the library updates (4b6b163)
ONNX Runtime log levels are now mapped directly to their corresponding tracing level instead of being knocked down a level (d8bcfd7)
Fixed the name of the flag set by TensorRTExecutionProvider::with_context_memory_sharing (#327)
- ...and with_build_heuristics & with_sparisty (b6ddfd8)
Fixed concurrent downloads from commit_from_url or ort-sys (eb51646/#323)
Fix linking XNNPACK on ARM64. (#384)