n.b. The following was generated via LLM from Git history. Only the contributor list has been verified.
ONNX Runtime Release 1.26.0
Announcement - Breaking Changes
- Support for CUDA 12 will be removed in 1.27.0.
- CUDA 13 will continue to be published as
onnxruntime-<os>-<arch>-gpu_cuda13-<version>.<ext>
- CUDA 13 will continue to be published as
- CUDA runtime will be moving soon to a dedicated Execution Provider (EP) instead of a published package from ORT core.
Highlights
- Added optional memory mapping for
.ortmodel loads (#28164). - Added RISC-V Vector (RVV) support for CPU EP (#28261).
- OpenVINO EP upgraded for 1.26.0 development release (#28297).
- WebGPU gained GridSample support (#28264) and Split-K improvements (#28151).
- CUDA plugin EP gained graph support (#28002), profiling API (#28216).
Security and Reliability Hardening
- Replaced unrestricted Python
setattrconfiguration with an allowlist (#28083). - Hardened multiple OOB and overflow scenarios across ML and core ops:
- Attention mask index OOB write (#27789).
- MaxPoolGrad indices bounds validation (#27903).
- SVM and TreeEnsemble bounds/security fixes (#27950, #27951, #27952, #27989).
- RNN sequence_lens OOB read and integer overflow handling (#28052, #28003).
- GroupQueryAttention seqlens_k bounds validation and compatibility follow-up (#28031, #28259).
- MatMulBnb4 and ML coefficient SafeInt checks (#27995, #28001).
- CUDA Gather int32 overflow fix (#28108).
- GridSample float->int64 cast hardening for NaN/Inf/out-of-range coords (#28302).
- Fixed session logger use-after-free during EP teardown under verbose logging (#28274).
CUDA, Attention, and MLAS
- Filled CUDA opset/operator gaps and extended support:
- Attention/GQA improvements:
- Fixed ONNX Attention min-bias alignment crash on SM<80 and masked-batch NaN behavior (#27831).
- Added FP32 QK accumulation path for unfused GQA attention (#28198).
- Added CUDART_VERSION reduction compatibility in GQA attention (#28296).
- Fixed CUDA 13 build error in GQA unfused attention (#28309).
- PagedAttention fallback for SM<80 fp16 (#28200).
- MLAS updates:
WebGPU, WebNN, and JavaScript
- WebGPU feature and correctness updates:
- Added GridSample (#28264).
- Split-K support for batch size > 1 (#28151).
- MatMulNBits refactor and batching improvements (#28109, #28197).
- MHA correctness fix when present outputs are not requested (#28027).
- Buffer upload overflow fix (#27948).
- Position ID bounds validation in WebGPU/JS RotaryEmbedding (#28214).
- WebNN change:
- Renamed pool2d property
roundingType->outputShapeRounding(#28172).
- Renamed pool2d property
- JavaScript ecosystem maintenance:
- Multiple dependency bumps.
Plugin EP and EP Ecosystem
- CUDA plugin EP:
- WebGPU plugin EP:
- Other EP updates:
Contributors
@tianleiwu, @yuslepukhin, @edgchen1, @vraspar, @hariharans29, @skottmckay, @eserscor, @xadupre, @sanaa-hamel-microsoft, @claude, @elwhyjay, @Rishi-Dave, @titaiwangms, @adrianlizarraga, @jatinwadhwa921, @jchen10, @Jiawei-Shao, @maxwbuckley, @preetha-intel, @qjia7, @qti-hungjuiw, @RajeevSekar, @umangb-09, @adrastogi, @akote123, @amd-genmingz, @ankitm3k, @apsonawane, @bachelor-dou, @baijumeswani, @bopeng1234, @chilo-ms, @chwarr, @Craigacp, @dccarmo, @derdeljan-msft, @ericcraw, @fdwr, @fs-eire, @gaugarg-nv, @gblong1, @GopalakrishnanN, @Honry, @intbf, @ishwar-raut1, @Jaswanth51, @javier-intel, @JonathanC-ARM, @julia-thorn, @justinchuby, @jwludzik, @Kevin-Taha, @Kotomi-Du, @MayureshV1, @mdvoretc-intel, @miaobin, @milpuz01, @mingyueliuh, @mklimenk, @n1harika, @prathikr, @psakhamoori, @qti-yuduo, @quic-calvnguy, @RyanMetcalfeInt8, @sfatimar, @sgbihu, @ShirasawaSama, @ssam18, @susbhere, @sushraja-msft, @TejalKhade28, @theHamsta, @TomCrypto, @TsofnatMaman, @velonica0, @vthaniel, @wenqinI, @xhan65, @xhcao