pypi onnxruntime 1.24.1
ONNX Runtime v1.24.1

19 hours ago

📢 Announcements & Breaking Changes

Platform Support Changes

  • Python 3.10 wheels are no longer published — Please upgrade to Python 3.11+
  • Python 3.14 support added
  • Free-threaded Python (PEP 703) — Added support for Python 3.13t and 3.14t in Linux (#26786)
  • x86_64 binaries for macOS/iOS are no longer provided and minimum macOS is raised to 14.0

API Version

  • ORT_API_VERSION updated to 24 (#26418)

✨ New Features

🤖 Execution Provider (EP) Plugin API

A major infrastructure enhancement enabling plugin-based EPs with dynamic loading:

  • Initial kernel-based EP support (#26206)
  • Weight pre-packing support for plugin EPs (#26754)
  • EP Context model support (#25124)
  • Control flow kernel APIs (#26927)
  • OrtKernelInfo APIs for kernel-based plugin EPs (#26803)

🔧 Core APIs

  • OrtApi::CreateEnvWithOptions() and OrtEpApi::GetEnvConfigEntries() (#26971)
  • EP Device Compatibility APIs (#26922)
  • External Resource Importer API for D3D12 shared resources (#26828)
  • Session config access from KernelInfo (#26589)

📊 Dependencies & Integration

  • ONNX upgraded to 1.20.1 (#26579)
  • Protobuf updated from 3.20.3 → 4.25.8 (#26910)
  • CUDA Graph enabled by default (#26929)

🖥️ Execution Provider Updates

NVIDIA

  • CUDA EP: Flash Attention updates, GQA kernel fusion, BF16 support for MoE/qMoE/MatMulNBits, CUDA 13.0 support
  • TensorRT EP: Upgraded to TensorRT 10.14, automatic plugin loading, NVFP4 custom ops
  • TensorRT RTX EP: RTX runtime caching, CUDA graph support, BFloat16, memory-mapped engines

Qualcomm QNN EP

  • QNN SDK upgraded to 2.42.0 with new ops (RMSNorm, ScatterElements, GatherND, STFT, RandomUniformLike)
  • Gelu pattern fusion, LPBQ quantization support, ARM64 wheel builds, v81 device support

Intel & AMD

  • OpenVINO EP: Upgraded to 2025.4.1
  • VitisAI EP: External EP loader, compiled model compatibility API
  • MIGraphX EP: QuickGelu, multihead attention, QLinear pooling ops

ArmNN EP

Arm is formally deprecating the Arm NN Execution Provider (EP) in ONNX Runtime. The Arm NN EP is still experimental and depends on technology that is no longer actively maintained. Keeping it available now only adds complexity and potential confusion for users.

What to expect:

  • Effective immediately, the Arm NN EP is deprecated and will no longer be maintained
  • All build options, documentation, and examples referencing ArmNN will be removed once the upstream change merges; the removal will appear in the first ONNX Runtime release that includes that change. We will confirm the release number as soon as it is known
  • Builds that still rely on Arm NN-specific options (for example --use_armnn) will fail after the change lands, so please adjust configurations in advance

🌐 Web & JavaScript

  • WebGPU EP: Flash Attention optimizations, graph capture, Split-K MatMul, qMoE support, WGSL templates
  • WebNN EP: GQA local attention, GatherBlockQuantized, ConvInteger/MatMulInteger
  • Node.js/React Native: Node.js v22, JSI for React Native, JSPI build support

🧠 CPU Improvements

  • KleidiAI: SME1/SME2 Convolution and SGemm kernels, FP32 Gemv, Windows/Arm support
  • New ops: MoE/qMoE kernels, RotaryEmbeddings opset 23, LayerNorm/RMSNorm broadcasting
  • Platform support: S390x SIMD, LoongArch64 4-bit quantization, FP16 inference improvements
  • ARM NCHWc layout support: NCHWc layout support for potential performance improvement of Conv models. Needs building from source with --enable_arm_neon_nchwc to enable this feature (#25580 #26838 #26691 #26171). This feature may be turned ON by default in a future release based on community feedback.
  • ARM perf improvements: Dedicated depthwise conv kernel (#26688) and SiLU activation perf improvement (#26753)

🔌 Language Bindings

C#

  • .NET 9.0 MAUI targets (#26463)

Python

  • add_external_initializers_from_files (#26012)

Java

  • Auto EP and compile model support (#25131)
  • OrtCompiledModelCompatibility (#26028)

🐛 Bug Fixes

Critical Fixes

  • DoS vulnerability in FuseReluClip (#26878)
  • Security issue loading arbitrary files as external data (#26776)
  • Memory leak fix for KernelContext_GetAllocator (#26883)
  • Local Attention off-by-1 bug (#25927)

EP-Specific Fixes

  • [QNN] Clip op with min/max from QDQ (#26601)
  • [CoreML] Gather fp16 support (#26442)

🙏 Contributors

Thanks to our 176 contributors for this release!

@adrianlizarraga, @apsonawane, @apwojcik, @bachelor-dou, @carzh, @chilo-ms, @daijh, @edgchen1, @fanchenkong1, @fs-eire, @HectorSVC, @ishwar-raut1, @jchen10, @jiajia-qin, @kunal-vaishnavi, @owenzhangzhengzhong, @prathikr, @psakhamoori, @qjia7, @qti-hungjuiw, @qti-yuduo, @quic-ashwshan, @quic-calvnguy, @quic-muchhsu, @quic-tirupath, @qwu16, @shaoboyan091, @skottmckay, @snnn, @tianleiwu, @tirupath-qti, @xieofxie, @xiaomsft, @yf711, @yifei410, @yuslepukhin, @Zhaeong, @aciddelgado, @ajindal1, @Ami-zhang, @amogkam, @ashari4, @axinging, @baijumeswani, @BowenBao, @brguru90, @caojilin, @cbourjau, @chaihahaha, @chenfucn, @chengxinlun, @codingl2k1, @csteele-PD, @dependabot, @DiamondGotCat, @dmitriyse, @duanqn, @duncanriach, @eaidova, @Ellested, @fajin-corp, @fdwr, @genminsong, @georgen117, @gerdner, @Gliniac, @gramalingam, @guoqingbao, @guschmue, @gyagp, @hanbitmyths, @hariharans29, @HelloBroBro, @huningxin, @iamhatesz, @jawilk, @jeffbdavenport, @jingyanwang, @jmwil, @jskhu, @justinchuby, @kleiti, @kotoyama-pet, @kreeben, @l1cachefault, @leca-rreb, @liqunfu, @logeshkumaramd, @loic-lopez, @lutzroeder, @maneeshs, @manuelhsitbai, @maoer1, @marko-vasic, @mattmacy, @mayeut, @memoryz, @mhamilton723, @mingruimingrui, @mrodriguezcouture, @mzh2711, @natke, @nicohon, @niuchang, @nkgfirecern, @nomigori, @ociaw, @ofleur, @oliemansm, @Origami-Tobiichi, @p-coder, @PatriceVignola, @pavignol, @pengwa, @peterchen-intel, @philschmid, @pineapple-exe, @pingren, @pkonopacki, @pneerincx, @prabhat00155, @psunn, @ranjitshs, @rinarive, @RishiDesai, @RyanUnderhill, @Raghuraman-S123, @sachinprasad, @sdindigern, @shaahji, @shiyi-intel, @shujahs, @siahuat0727, @smitkothari26, @snadampal, @sophies927, @statelesshz, @stevenlix, @stonebuddha2, @supriyar, @sushiquilting, @svekars, @t-vi, @thewh1teagle, @thiagocrepaldi, @tomwillow, @TryTwo, @vbaddi, @vedantb4, @vickywei43, @vodianyk, @vorobyov, @wangyems, @wejoncy, @wenmingw, @wfuji1, @wujingy1, @xhcao, @xiaolu, @yangzi33, @yanivmo, @ydnar, @yiningweb, @yitongh, @yuhonghong66, @yukun0510, @yunchu, @yzhang93, @Zantares, @zhangYiIntel, @zhangzy-nlp, @zhenv5, @zhijxu-MS, @ZhongYuanKang, @zhuzhenbo, @zjp, @zouxiaoliang, @Copilot


Full Changelog: v1.23.2...rel-1.24.1

Don't miss a new onnxruntime release

NewReleases is sending notifications on new releases.