pypi executorch 1.3.1
v1.3.1

4 hours ago

ExecuTorch v1.3.1 Release Notes

1.3.1 is the first broadly published 1.3 patch release, following the Maven-only 1.3.0 release.

Highlights

ExecuTorch v1.3.1 expands model and backend coverage across embedded, mobile, and GPU targets. This release adds major Arm, Cortex-M, VGF, NXP, Qualcomm, CUDA, Metal, MLX, Vulkan, and XNNPACK improvements, and continues broadening LLM and multimodal model support.

  • Arm backend (Ethos-U / TOSA / VGF / Cortex-M) - Major expansion in TOSA dialect lowering, quantization support, VGF infrastructure, Cortex-M test coverage, Ethos-U compatibility, and CI reliability.
  • New and expanded model support - Qwen3.5 MoE, Gemma 4 31B, LFM2.5, Voxtral Realtime/TTS, Llama4 export support, and additional Android LLM runner documentation.
  • CUDA backend - Qwen3.5 MoE export/runtime work, fused MoE and INT4 Triton kernels, CUDA graph capture/replay, GPU-side sampling, and memory/stat reporting.
  • Metal and MLX backends - Qwen3.5 MoE Metal path, Gemma 4 31B MLX support, MLX delegate progress, integer op support, top-k fallback, gated delta rule kernels, and hardened runtime code signing.
  • Vulkan backend - Cooperative matrix dispatch for linear/matmul, safer memory handling, integer overflow and heap-buffer fixes, improved partitioning, and VGF compatibility work.
  • Qualcomm AI Engine Direct (QNN) - Expanded ATen op support, LPAI and Direct Mode work, custom op and quantization annotation APIs, debugging improvements, performance dumping, and heap profiling.
  • NXP backend - Continued new Neutron flow rollout, eIQ Neutron SDK 3.1.1, stricter graph verification, QAT coverage, and additional operator support.
  • Runtime and packaging - Shared library install support, Android lifecycle/error handling fixes, and improved wheel/build reliability.

New Models and Model Enablement

  • Qwen3.5 MoE export and runner support across CUDA and Metal, including fused MoE kernels, INT4 paths, CUDA graph support, and runtime stats.
  • Gemma 4 31B export and runner support with CUDA and MLX backend support.
  • LFM2.5 export and runner support for MLX.
  • Voxtral Realtime and Voxtral TTS improvements, including streaming fixes, CUDA backend support, and documentation updates.
  • Android LLM runner documentation and HuggingFace-oriented usage docs.

Core Components

EXIR / Export

  • Safer ETRecord/export deserialization.
  • FP8 placeholder support in ExecuTorch serialization.
  • Fixes for torch.split alias annotations and partition grouping.
  • Migration cleanup away from deprecated capture config paths.

Runtime

  • Backend and runtime option handling improvements.
  • Better contextual error reporting for debuggability.

Kernels and Operators

  • Portable adaptive average pool 2D kernel.
  • fp32 accumulation for Half/BFloat16 grid sampler bilinear paths.
  • Fused quantized add, mul, relu, hardswish, and bmm kernels.
  • Quantized max pool and quantized conv1d fixes.
  • Additional quantization and observer fixes across portable and backend kernels.

Backend Delegates

Arm / Ethos-U / TOSA / VGF / Cortex-M

  • Expanded TOSA op coverage including slice, max pool, avg pool, adaptive avg pool, shape ops, resize, LSTM decomposition, and additional dtype/layout handling.
  • FP16/BF16 operator coverage is now considered largely complete; most support landed incrementally over prior releases and the 1.3 cycle adds test stabilization for FP16 MobileNetV3 and VGF bilinear paths.
  • Refactored dim-order and permute handling. Removed the legacy tosa_dim_order path, added permute-removal for TOSA ops, and improved permute fusion over elementwise ops. This reduced unnecessary transposes in lowered graphs.
  • Improved quantization support, including composable quantizer work, constant folding fixes, TOSA/VGF linear quantization modes, and Cortex-M MVE/Helium int16 quantize/dequantize support.
  • VGF build/test infrastructure and documentation.
  • Cortex-M docs, TinyML validation, CMSIS-NN integration, linting, and CI improvements.
  • Ethos-U driver backwards compatibility and more robust FVP/toolchain download handling.

CUDA

  • Qwen3.5 MoE export/runtime path with fused MoE and INT4 Triton kernels.
  • Gemma 4 31B support on CUDA.
  • CUDA graph capture/replay and GPU-side Gumbel-max sampling.
  • Memory-efficient model export.
  • Runtime cross-method weight sharing.
  • GPU memory tracking and structured runtime stats.
  • CI and build reliability fixes.

Metal / MLX

  • Qwen3.5 MoE Metal export, build target, source transforms, and integration tests.
  • MLX delegate progress and integer op support.
  • Gemma 4 31B support through MLX.
  • Top-k fallback, gated delta rule kernels, SDPA updates, and hardened runtime code signing.

Vulkan

  • Cooperative matrix dispatch for linear/matmul.
  • Safer memory allocation and heap-buffer handling.
  • Integer overflow fixes and improved partitioning behavior.
  • VGF compatibility and Parakeet documentation/build fixes.

Qualcomm AI Engine Direct (QNN)

  • New QNN op support including isinf, isnan, rand, randn, log2, log10, log1p, trunc, acos, atan2, tan, avg_pool1d, remainder, reflection_pad, and scatter.src.
  • LPAI and Direct Mode support.
  • Custom op package and quantization annotation APIs.
  • Debugging/numeric discrepancy tooling and performance dump improvements.
  • Multimodal and LLM quantization guidance updates.

NXP

  • eIQ Neutron SDK updated to version 3.1.1. eIQ Neutron SDK 3.1.x introduced a new MLIR-based conversion flow with broader operator support for Neutron-C architectures.
  • Operators migrated to the MLIR flow: aten.avg_pool2d, aten.max_pool2d, aten.abs, aten.constant_pad_nd, aten.sigmoid, aten.leaky_relu, and aten.adaptive_avg_pool2d.
  • New ops: 1D MaxPool, aten.bmm, and aten.conv_transpose1.
  • Unified test pipeline for operator and model testing.

XNNPACK

  • Enable weight cache and workspace sharing as runtime options.
  • Output padding support for transposed convolution.
  • Workspace sharing and weight cache concurrency fixes.

Platforms

Android

  • Improved JNI error reporting and lifecycle handling.
  • Tensor.copyDataInto added to the Java API.
  • LLM module lifecycle/thread-safety fixes and test coverage.
  • Modern Clang and ETDump path override fixes.

Apple / CoreML / iOS

  • CoreML profiler crash fix.
  • CoreML partitioner skips for unsupported random and argmin/argmax patterns.
  • iOS 18 quantization error hints.
  • Shared library install support for consumers and backends.

Documentation

  • Android LLM runner docs.
  • Arm, Cortex-M, Ethos-U, and VGF documentation updates.
  • NXP QAT and Neutron flow documentation updates.
  • Vulkan, Parakeet, Voxtral, and LFM2.5 documentation updates.

Deprecations

The legacy MPS backend remains deprecated and is expected to be removed in a future release.

Contributors

We welcome 62 first-time human contributors to ExecuTorch in this release:

@s09g, @Froskekongen, @MahinAshraful, @qti-mmadhava, @quic-boyuc, @qti-horodnic, @rezaasjd, @ocvh, @twsl, @mvartani-meta, @ifed-ucsd, @tmtrademarked, @berndporr, @leixin, @amov-meta, @xiaodong705, @Hyungkeun-Park-Nota, @AlannaBurke, @kastopia, @zhaoxul-qti, @VasuAgrawal, @JorickvdHoeven, @dveremeevfb, @alexey-sidnev, @Lidang-Jiang, @notaJiminLee, @irtrukhina, @mvsfb, @SAY-5, @Jah-yee, @fgsiveone, @boomitsnoom, @jeanschmidt, @KevinUW114514, @aksharabhardwaj766-commits, @AlessandroVacca, @gkrulce, @AswaniSahoo, @OpenByteDev, @ishangodawatta, @WongJohnson, @ymrohit, @XAheli, @jiawei-lyu, @Nazim-fad, @yadferhad, @darknight054, @xuyanwen2012, @matt-cossins, @zeel2104, @john-rocky, @luhenry, @christine-long-meta, @hboyraz, @telgamal-1, @madhesh60, @omkar-334, @vacu9708, @Vasanthadithya-mundrathi, @ozgecinko, @wliuyx, @tomeuv

Full Changelog

Full Changelog: v1.2.0...v1.3.1

Don't miss a new executorch release

NewReleases is sending notifications on new releases.