ExecuTorch v1.3.1 Release Notes
1.3.1 is the first broadly published 1.3 patch release, following the Maven-only 1.3.0 release.
Highlights
ExecuTorch v1.3.1 expands model and backend coverage across embedded, mobile, and GPU targets. This release adds major Arm, Cortex-M, VGF, NXP, Qualcomm, CUDA, Metal, MLX, Vulkan, and XNNPACK improvements, and continues broadening LLM and multimodal model support.
- Arm backend (Ethos-U / TOSA / VGF / Cortex-M) - Major expansion in TOSA dialect lowering, quantization support, VGF infrastructure, Cortex-M test coverage, Ethos-U compatibility, and CI reliability.
- New and expanded model support - Qwen3.5 MoE, Gemma 4 31B, LFM2.5, Voxtral Realtime/TTS, Llama4 export support, and additional Android LLM runner documentation.
- CUDA backend - Qwen3.5 MoE export/runtime work, fused MoE and INT4 Triton kernels, CUDA graph capture/replay, GPU-side sampling, and memory/stat reporting.
- Metal and MLX backends - Qwen3.5 MoE Metal path, Gemma 4 31B MLX support, MLX delegate progress, integer op support, top-k fallback, gated delta rule kernels, and hardened runtime code signing.
- Vulkan backend - Cooperative matrix dispatch for linear/matmul, safer memory handling, integer overflow and heap-buffer fixes, improved partitioning, and VGF compatibility work.
- Qualcomm AI Engine Direct (QNN) - Expanded ATen op support, LPAI and Direct Mode work, custom op and quantization annotation APIs, debugging improvements, performance dumping, and heap profiling.
- NXP backend - Continued new Neutron flow rollout, eIQ Neutron SDK 3.1.1, stricter graph verification, QAT coverage, and additional operator support.
- Runtime and packaging - Shared library install support, Android lifecycle/error handling fixes, and improved wheel/build reliability.
New Models and Model Enablement
- Qwen3.5 MoE export and runner support across CUDA and Metal, including fused MoE kernels, INT4 paths, CUDA graph support, and runtime stats.
- Gemma 4 31B export and runner support with CUDA and MLX backend support.
- LFM2.5 export and runner support for MLX.
- Voxtral Realtime and Voxtral TTS improvements, including streaming fixes, CUDA backend support, and documentation updates.
- Android LLM runner documentation and HuggingFace-oriented usage docs.
Core Components
EXIR / Export
- Safer ETRecord/export deserialization.
- FP8 placeholder support in ExecuTorch serialization.
- Fixes for
torch.splitalias annotations and partition grouping. - Migration cleanup away from deprecated capture config paths.
Runtime
- Backend and runtime option handling improvements.
- Better contextual error reporting for debuggability.
Kernels and Operators
- Portable adaptive average pool 2D kernel.
- fp32 accumulation for Half/BFloat16 grid sampler bilinear paths.
- Fused quantized add, mul, relu, hardswish, and bmm kernels.
- Quantized max pool and quantized conv1d fixes.
- Additional quantization and observer fixes across portable and backend kernels.
Backend Delegates
Arm / Ethos-U / TOSA / VGF / Cortex-M
- Expanded TOSA op coverage including slice, max pool, avg pool, adaptive avg pool, shape ops, resize, LSTM decomposition, and additional dtype/layout handling.
- FP16/BF16 operator coverage is now considered largely complete; most support landed incrementally over prior releases and the 1.3 cycle adds test stabilization for FP16 MobileNetV3 and VGF bilinear paths.
- Refactored dim-order and permute handling. Removed the legacy
tosa_dim_orderpath, added permute-removal for TOSA ops, and improved permute fusion over elementwise ops. This reduced unnecessary transposes in lowered graphs. - Improved quantization support, including composable quantizer work, constant folding fixes, TOSA/VGF linear quantization modes, and Cortex-M MVE/Helium int16 quantize/dequantize support.
- VGF build/test infrastructure and documentation.
- Cortex-M docs, TinyML validation, CMSIS-NN integration, linting, and CI improvements.
- Ethos-U driver backwards compatibility and more robust FVP/toolchain download handling.
CUDA
- Qwen3.5 MoE export/runtime path with fused MoE and INT4 Triton kernels.
- Gemma 4 31B support on CUDA.
- CUDA graph capture/replay and GPU-side Gumbel-max sampling.
- Memory-efficient model export.
- Runtime cross-method weight sharing.
- GPU memory tracking and structured runtime stats.
- CI and build reliability fixes.
Metal / MLX
- Qwen3.5 MoE Metal export, build target, source transforms, and integration tests.
- MLX delegate progress and integer op support.
- Gemma 4 31B support through MLX.
- Top-k fallback, gated delta rule kernels, SDPA updates, and hardened runtime code signing.
Vulkan
- Cooperative matrix dispatch for linear/matmul.
- Safer memory allocation and heap-buffer handling.
- Integer overflow fixes and improved partitioning behavior.
- VGF compatibility and Parakeet documentation/build fixes.
Qualcomm AI Engine Direct (QNN)
- New QNN op support including
isinf,isnan,rand,randn,log2,log10,log1p,trunc,acos,atan2,tan,avg_pool1d,remainder,reflection_pad, andscatter.src. - LPAI and Direct Mode support.
- Custom op package and quantization annotation APIs.
- Debugging/numeric discrepancy tooling and performance dump improvements.
- Multimodal and LLM quantization guidance updates.
NXP
- eIQ Neutron SDK updated to version 3.1.1. eIQ Neutron SDK 3.1.x introduced a new MLIR-based conversion flow with broader operator support for Neutron-C architectures.
- Operators migrated to the MLIR flow:
aten.avg_pool2d,aten.max_pool2d,aten.abs,aten.constant_pad_nd,aten.sigmoid,aten.leaky_relu, andaten.adaptive_avg_pool2d. - New ops: 1D MaxPool,
aten.bmm, andaten.conv_transpose1. - Unified test pipeline for operator and model testing.
XNNPACK
- Enable weight cache and workspace sharing as runtime options.
- Output padding support for transposed convolution.
- Workspace sharing and weight cache concurrency fixes.
Platforms
Android
- Improved JNI error reporting and lifecycle handling.
Tensor.copyDataIntoadded to the Java API.- LLM module lifecycle/thread-safety fixes and test coverage.
- Modern Clang and ETDump path override fixes.
Apple / CoreML / iOS
- CoreML profiler crash fix.
- CoreML partitioner skips for unsupported random and argmin/argmax patterns.
- iOS 18 quantization error hints.
- Shared library install support for consumers and backends.
Documentation
- Android LLM runner docs.
- Arm, Cortex-M, Ethos-U, and VGF documentation updates.
- NXP QAT and Neutron flow documentation updates.
- Vulkan, Parakeet, Voxtral, and LFM2.5 documentation updates.
Deprecations
The legacy MPS backend remains deprecated and is expected to be removed in a future release.
Contributors
We welcome 62 first-time human contributors to ExecuTorch in this release:
@s09g, @Froskekongen, @MahinAshraful, @qti-mmadhava, @quic-boyuc, @qti-horodnic, @rezaasjd, @ocvh, @twsl, @mvartani-meta, @ifed-ucsd, @tmtrademarked, @berndporr, @leixin, @amov-meta, @xiaodong705, @Hyungkeun-Park-Nota, @AlannaBurke, @kastopia, @zhaoxul-qti, @VasuAgrawal, @JorickvdHoeven, @dveremeevfb, @alexey-sidnev, @Lidang-Jiang, @notaJiminLee, @irtrukhina, @mvsfb, @SAY-5, @Jah-yee, @fgsiveone, @boomitsnoom, @jeanschmidt, @KevinUW114514, @aksharabhardwaj766-commits, @AlessandroVacca, @gkrulce, @AswaniSahoo, @OpenByteDev, @ishangodawatta, @WongJohnson, @ymrohit, @XAheli, @jiawei-lyu, @Nazim-fad, @yadferhad, @darknight054, @xuyanwen2012, @matt-cossins, @zeel2104, @john-rocky, @luhenry, @christine-long-meta, @hboyraz, @telgamal-1, @madhesh60, @omkar-334, @vacu9708, @Vasanthadithya-mundrathi, @ozgecinko, @wliuyx, @tomeuv
Full Changelog
Full Changelog: v1.2.0...v1.3.1