๐ If you find ort useful, please consider sponsoring us on Open Collective ๐
๐ค Need help upgrading? Ask questions in GitHub Discussions or in the pyke.io Discord server!
๐ Tensor Array Views
You can now create a TensorRef directly from an ArrayView. Previously, tensors could only be created via Tensor::from_array (which, in many cases, performed a copy if borrowed data was provided). The new TensorRef::from_array_view (and the complementary TensorRefMut::from_array_view_mut) method(s) allows for the zero-copy creation of tensors directly from an ArrayView.
Tensor::from_array now only accepts owned data, so you should either refactor your code to use TensorRefs or pass ownership of the array to the Tensor.
โ ๏ธ
ndarrays must be in standard/contiguous memory layout to be converted to aTensorRef(Mut); see.as_standard_layout().
โ๏ธ Copy Tensors
rc.10 now allows you to manually copy tensors between devices using Tensor::to!
// Create our tensor in CUDA memory
let cuda_allocator = Allocator::new(
&session,
MemoryInfo::new(AllocationDevice::CUDA, 0, AllocatorType::Device, MemoryType::Default)?
)?;
let cuda_tensor = Tensor::<f32>::new(&cuda_allocator, [1_usize, 3, 224, 224])?;
// Copy it back to CPU
let cpu_tensor = cuda_tensor.to(AllocationDevice::CPU, 0)?;There's also Tensor::to_async, which replicates the functionality of PyTorch's non_blocking=True. Additionally, Tensors now implement Clone.
โ๏ธ Alternative Backends
ort is no longer just a wrapper for ONNX Runtime; it's a one-stop shop for inferencing ONNX models in Rust thanks to the addition of the alternative backend API.
Alternative backends wrap other inference engines behind ONNX Runtime's API, which can simply be dropped in and used in ort - all it takes is one line of code:
fn main() {
ort::set_api(ort_tract::api()); // <- magic!
let session = Session::builder()?
...
}2 alternative backends are shipping alongside rc.10 - ort-tract, powered by tract, and ort-candle, powered by candle, with more to come in the future.
Outside of the Rust ecosystem, these alternative backends can also be compiled as standalone libraries that can be directly dropped in to applications as a replacement for libonnxruntime. ๐ฆ๐ฆ
โ๏ธ Model Editor
Models can be created entirely programmatically, or edited from an existing ONNX model via the new Model Editor API.
See src/editor/tests.rs for an example of how an ONNX model can be created programmatically. You can combine the Model Editor API with SessionBuilder::with_optimized_model_path to export the model outside Rust.
โ๏ธ Compiler
Many execution providers internally convert ONNX graphs to a framework-specific graph representation, like CoreML networks/TensorRT engines. This process can take a long time, especially for larger and more complex models. Since these generated artifacts aren't persisted between runs, they have to be created every time a session is loaded.
The new Compiler API allows you to compile an optimized, EP-ready graph ahead-of-time, so subsequent loads are lighting fast! โก
ModelCompiler::new(
Session::builder()?
.with_execution_providers([
TensorRTExecutionProvider::default().build()
])?
)?
.with_model_from_file("model.onnx")?
.compile_to_file("compiled_trt_model.onnx")?;๐ชถ #![no_std]
๐จ BREAKING: If you previously used
ortwithdefault-features = false...That will now disable
ort'sstdfeature, which means you don't get to use APIs that interact with the operating system, likeSessionBuilder::commit_from_file- APIs you probably need!To minimize breakage, manually enable the
stdfeature:[dependencies] ort = { version = "=2.0.0-rc.10", default-features = false, features = [ "std", ... ] }
ort no longer depends on std (but does still depend on alloc) - default-features = false will enable #![no_std] for ort.
โก Execution Providers
๐จ BREAKING: Boolean options for ArmNN, CANN, CoreML, CPU, CUDA, MIGraphX, NNAPI, OpenVINO, & ROCm...
If you previously used an option setter on one of these EPs that took no parameters (i.e. a boolean option that was
falseby default), note that these functions now do take a boolean parameter to align with Rust idiom.Migrating is as simple as passing
trueto these functions. Affected functions include:
ArmNNExecutionProvider::with_arena_allocatorCANNExecutionProvider::with_dump_graphsCPUExecutionProvider::with_arena_allocatorCUDAExecutionProvider::with_cuda_graphCUDAExecutionProvider::with_skip_layer_norm_strict_modeCUDAExecutionProvider::with_prefer_nhwcMIGraphXExecutionProvider::with_fp16MIGraphXExecutionProvider::with_int8NNAPIExecutionProvider::with_fp16NNAPIExecutionProvider::with_nchwNNAPIExecutionProvider::with_disable_cpuNNAPIExecutionProvider::with_cpu_onlyOpenVINOExecutionProvider::with_opencl_throttlingOpenVINOExecutionProvider::with_dynamic_shapesOpenVINOExecutionProvider::with_npu_fast_compileROCmExecutionProvider::with_exhaustive_conv_search
๐จ BREAKING: Renamed enum options for CANN, CUDA, QNN...
The following EP option enums have been renamed to reduce verbosity:
CANNExecutionProviderPrecisionMode->CANNPrecisionModeCANNExecutionProviderImplementationMode->CANNImplementationModeCUDAExecutionProviderAttentionBackend->CUDAAttentionBackendCUDAExecutionProviderCuDNNConvAlgoSearch->CuDNNConvAlgorithmSearchQNNExecutionProviderPerformanceMode->QNNPerformanceModeQNNExecutionProviderProfilingLevel->QNNProfilingLevelQNNExecutionProviderContextPriority->QNNContextPriority
๐จ BREAKING: Updated CoreML options...
CoreMLExecutionProviderhas been updated to use a new registration API, unlocking more options. To migrate old options:
.with_cpu_only()->.with_compute_units(CoreMLComputeUnits::CPUOnly).with_ane_only()->.with_compute_units(CoreMLComputeUnits::CPUAndNeuralEngine).with_subgraphs()->.with_subgraphs(true)
rc.10 adds support for 3 execution providers:
- Azure allows you to call Azure AI models like GPT-4 directly from
ort. - WebGPU is powered by Dawn, an implementation of the WebGPU standard, allowing accelerated inference with almost any D3D12/Metal/Vulkan/OpenGL-supported GPU. Binaries with the WebGPU EP are available on Windows & Linux, so you can start testing it straight away!
- NV TensorRT RTX is a new execution provider purpose-built for NVIDIA RTX GPUs running with ONNX Runtime on Windows. It's powered by TensorRT for RTX, a specially-optimized inference library built upon TensorRT releasing in June.
All binaries are now statically linked! This means the cuda and tensorrt features no longer use onnxruntime.dll/libonnxruntime.so. The EPs themselves do still require separate DLLs - like libonnxruntime_providers_cuda - but this change should make it significantly easier to set up and use ort with CUDA/TRT.
๐งฉ Custom Operator Improvements
๐จ BREAKING: Migrating your custom operators...
- All methods under
Operatornow take&self.- The operator's kernel is no longer an associated type -
create_kernelis instead expected to return aBox<dyn Kernel>(which can now be created directly from a function!)impl Operator for MyCustomOp { - type Kernel = MyCustomOpKernel; - fn name() -> &'static str { + fn name(&self) -> &str { "MyCustomOp" } - fn inputs() -> Vec<OperatorInput> { + fn inputs(&self) -> Vec<OperatorInput> { vec![OperatorInput::required(TensorElementType::Float32)] } - fn outputs() -> Vec<OperatorOutput> { + fn outputs(&self) -> Vec<OperatorOutput> { vec![OperatorOutput::required(TensorElementType::Float32)] } - fn create_kernel(_: &KernelAttributes) -> ort::Result<Self::Kernel> { - Ok(MyCustomOpKernel) - } + fn create_kernel(&self, _: &KernelAttributes) -> ort::Result<Box<dyn Kernel>> { + Ok(Box::new(|ctx: &KernelContext| { + ... + })) + } }To add an operator to an
OperatorDomain, you now pass the operator by value instead of as a type parameter:let mut domain = OperatorDomain::new("io.pyke")?; -domain = domain.add::<MyCustomOp>()?; +domain = domain.add(MyCustomOp)?;
Custom operators have been internally revamped to reduce code size & compilation time, and allow operators to be Sized.
๐ท Miscellaneous changes
- Updated to ONNX Runtime v1.22.0.
- The minimum supported Rust version (MSRV) is now 1.81.0.
- The
tracingdependency is now optional (but enabled by default).- To keep using
tracingwithdefault-features = false, enable thetracingfeature. - When disabled, ONNX Runtime will log its messages directly to stdout. The log level defaults to
WARNbut can be controlled at runtime via theORT_LOGenvironment variable by setting it to one ofverbose,info,warning,error, orfatal.
- To keep using
- The domain serving prebuilt binaries has moved from
parcel.pyke.iotocdn.pyke.io, so make sure to update firewall exclusions. - The
build.rshack for Apple platforms is no longer required. (9b31680) - The
ureqdependency (used bydownload-binaries/fetch-models) has been ugpraded to v3.0.ortwith thefetch-modelsfeature will userustlsas the TLS provider.ort-syswith thedownload-binariesfeature will usenative-tlssince that pulls less dependencies (it previously usedrustls). No prerequisites are required when building on Windows & macOS, but other platforms now require OpenSSL to be installed.
- All ONNX Runtime tensor types are now supported - including
Complex64&Complex128, 4-bit integers, and 8 bit floats!- Tensors of these types cannot be created from an array or extracted since they don't have de facto Rust equivalents, but you can use
DynTensor::newto allocate a tensor andDynTensor::data_ptrto access its data.
- Tensors of these types cannot be created from an array or extracted since they don't have de facto Rust equivalents, but you can use
- Reduce allocations (
e136869)Session::runcan now be zero-alloc (on the Rust side)!
- Prebuilt binaries are now powered by KleidiAI on ARM64 - this should make them a fair bit faster!
โ ๏ธ Breaking
- ๐จ
Session::runnow takes&mut self.- ๐ก Tip when using mutexes: You can use
SessionOutputs::removeto get an owned session output.
- ๐ก Tip when using mutexes: You can use
- ๐จ
ort::inputs!no longer outputs aResult, so remove the trailing?from any invocations of the macro. - ๐จ
extract_tensorto extract a tensor to anndarrayhas been renamed toextract_array, withextract_raw_tensornow taking the place ofextract_tensor.DynValue::try_extract_tensor(_mut)->DynValue::try_extract_array(_mut)Tensor::extract_tensor(_mut)->Tensor::extract_array(_mut)DynValue::try_extract_raw_tensor(_mut)->DynValue::try_extract_tensor(_mut)Tensor::extract_raw_tensor(_mut)->Tensor::extract_tensor(_mut)
Session::run_asyncnow always takes&RunOptions;Session::run_async_with_optionshas been removed.- Most instances of "dimensions" (i.e. in
ValueType::tensor_dimensions) has been replaced with "shape" (soValueType::tensor_shape) for consistency. - Tensor shapes now use a custom struct,
ort::tensor::Shape, instead of aVec<i64>directly.- Similarly,
ValueType::Tensor.dimension_symbolsis its own struct,SymbolicDimensions. - Both can be converted from their prior forms via
::from()/.into().
- Similarly,
SessionBuilder::with_execution_providersnow takesAsRef<[EP]>instead of any iterable type.SessionBuilder::with_external_initializer_file_in_memoryrequires aPathfor thepathparameter instead of a regular&str.
๐ชฒ Fixes
- Zero out tensors created on the CPU via
Tensor::new. (7a95f98)- In some cases, the memory allocated by ONNX Runtime for new tensors was not initally zeroed. Now, any tensors created in CPU-accessible memory via
Tensor::newwill be manually zeroed on the Rust side.
- In some cases, the memory allocated by ONNX Runtime for new tensors was not initally zeroed. Now, any tensors created in CPU-accessible memory via
IoBinding::synchronize_*now takes&selfsosynchronize_outputscan actually be used as intended (e8d873a)- Fix
XNNPACKExecutionProvider::is_availablealways returningfalse(5ad997c) - Fix a memory lifetime issue with
AllocationDevice&MemoryInfo(3ca14c2) - Fix OpenVINO EP registration failures by ensuring an environment is available (
3e7e8fe)- and use the new registration API for OpenVINO (
5661450)
- and use the new registration API for OpenVINO (
ort-syscrate now specifieslinks, hopefully preventing linking conflicts (d2dc7c8)- Correct the internal device name for the DirectML
AllocationDevice(46c3376) ort-sysno longer tries to download binaries when building with--offline(d7d4493)- Dylib symlinks are now properly renewed when the library updates (
4b6b163) - ONNX Runtime log levels are now mapped directly to their corresponding
tracinglevel instead of being knocked down a level (d8bcfd7) - Fixed the name of the flag set by
TensorRTExecutionProvider::with_context_memory_sharing(#327)- ...and
with_build_heuristics&with_sparisty(b6ddfd8)
- ...and
- Fixed concurrent downloads from
commit_from_urlorort-sys(eb51646/#323) - Fix linking XNNPACK on ARM64. (#384)
