Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax, Frontend, TIR, Runtime, etc.

Please visit the full listing of commits for a complete view: v0.24.0...v0.25.0.

Community

None.

RFCs

None.

Arith

#19604 - [REFACTOR][TIR]Phase out ControlFlowGraph, NarrowPredicateExpression, and rename Simplify to StmtSimplify
#19638 - [REFACTOR]Phase out arith/scalable_expression; arith no longer proves over scalable vectors
#19670 - Memoize IntervalSet variable relaxation to avoid exponential blowup
#19669 - Gate canonical-simplify LT Case 2 on extra scale == +1
#19675 - Make Analyzer a tvm-ffi Object

BugFix

#19502 - [TIR] Skip bool-typed expressions in CSE
#19497 - [Relax] Fix scatter_elements and scatter_nd CUDA compilation
#19498 - [Relax][ONNX] Resolve param Vars in Concat to handle mixed Shape/Tensor inputs
#19511 - [Relax][Torch] Honor multi-axis dims in torch.flip converter
#19512 - [Relax][Torch] Honor correction in std/var converter
#19514 - [S-TIR] Wrap bare scalar bodies in DefaultGPUSchedule to avoid root-block crash
#19527 - [Relax]: handle ONNX ScatterElements reduction
#19535 - [Fix][Relax]: ONNX Clip NaN bounds and preserve input NaN (ORT parity)
#19554 - [Fix][CI]: remove astral-sh/setup-uv from lint workflow
#19557 - [Fix][Relax] Lower bool prod as logical all
#19567 - [Target][LLVM] Use libm for asin/acos instead of buggy inline Taylor
#19568 - [Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern
#19619 - [Vulkan][CodeGen] Change OpControlBarrier to AcquireRelease
#19643 - [Fix] Stabilize layer_norm variance computation with two-pass reduction
#19650 - [Fix][Relax] Support ND batched matmul chains in AdjustMatmulOrder pass
#19683 - [Fix] CommReduce could handle 0-dim data
#19779 - [Fix] nn.attention support dynamic batch_size
#19808 - [Fix] Revert C++20-only lambda captures for C++17 build

CI

#19629 - Remove tvm-lint from tvm-bot
#19656 - Add cibw-based wheel publishing to PyPI
#19659 - Wheel publishing follow-ups
#19665 - Derive the version from Git tags via setuptools_scm
#19664 - Reformat the macOS repair-wheel-command as a multiline script
#19697 - Target apache-tvm for PyPI wheel publishing
#19775 - Merge PR against its target branch instead of main (#19712)
#19685 - Remove PyPI-only tag ref guard from wheel publishing
#19703 - Pin actions by version tag, trim wheel perms
#19706 - [Tests] Fix s_tir tests using removed T.block API in TIRx script
#19700 - Fix release verification script
#19704 - [Tests] Skip test modules cleanly when optional deps are missing
#19713 - Fix CI script test subprocess environment
#19724 - [Tests][Disco] Skip CCL tests when runtime support is absent
#19725 - [Tests][Relax] Gate multi-GPU VM test on three devices
#19726 - [Tests][Hexagon] Lazily import pytest plugin dependencies
#19730 - [Tests][NNAPI] Skip tests cleanly when remote environment is unavailable
#19729 - [Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let
#19715 - [Tests] Remove test_runtime_ndarray (covered by tvm-ffi)
#19731 - [Script][Tests] Fix dialect redirect module re-execution and stray category-less tirx.intrin_test op
#19735 - [S-TIR][Tests] Fix transform test failures after TIRx bringup
#19740 - [Tests] Check WebGPU volatile allreduce annotation structurally
#19746 - [Tests] Fix flaky popen pool executor test
#19738 - Align cuda-python with PyTorch cuda-bindings
#19745 - [Tests][LLVM] Gate stepvector intrinsic rename on LLVM 20
#19751 - [S-TIR][Tests] Mark test_cp_async_in_if_then_else as xfail
#19737 - Run s_tir/transform tests in the python-unittest stage
#19754 - Updated cibw to 4.1.0
#19752 - [Tests][AArch64] Make SVE codegen assertions robust across LLVM versions
#19761 - Drop redundant cmake/ninja install from the Linux wheel CUDA sidecar
#19777 - [Tests] Modernize test gating
#19786 - [Tests] Make TargetCreation.DeduplicateKeys host-agnostic on AArch64
#19787 - [Tests] Replace remaining requires_* helpers with standard pytest
#19793 - Pin GitHub Actions to SHA for ASF INFRA compliance
#19798 - Remove Jenkins PR linter step
#19800 - [Tests][Refactor] Remove unused testing helpers

Docs

#19606 - Reorganize development guide content
#19720 - Clarify loading serialized artifacts requires a trusted source
#19782 - [CI] Bump tlcpack-sphinx-addon to restore search result summaries
#19788 - Modernize test-gating documentation

Frontend

#19590 - [ONNX] Add RMSNormalization converter for ONNX opset 23

Hexagon

#19747 - [Tests] Clean up stale hexagon tests
#19796 - [REFACTOR]Phase out Hexagon app and test wrappers

LLVM

#19716 - [Codegen]Accept splat form in VLA broadcast test
#19744 - [Codegen][Tests] Gate +v9a vscale_range expectation on LLVM version

Relax

#19495 - [Frontend] Add ParameterList and ParameterDict containers
#19491 - [Frontend][TFLite] Add segment operator mappings
#19499 - [Frontend][TFLite] Add tests coverage for SPACE_TO_BATCH_ND and BATCH_TO_SPACE_ND
#19516 - [TFLite] Add gather frontend expected IRModule tests
#19488 - [PyTorch] Fix segfault in from_exported_program when model uses index_put_ with tuple output
#19523 - [Frontend][TFLite] Add Conv3D support
#19525 - [ONNX] Normalize negative indices before the take call for Gather operator
#19530 - [Frontend] Add TFLite Frontend Support for CONV_3D_TRANSPOSE
#19536 - [Frontend][TFLite] Add initial StableHLO builtin operator support
#19547 - [ONNX] Set max_output_boxes_per_class default value to 0 for NonMaxSuppression
#19515 - [ONNX] Add ONNX Backend Tests for systematic frontend coverage
#19566 - [ONNX] Prevent Div divide-by-zero crashes
#19573 - [ONNX] Fix TopK scalar K extraction in from_onnx
#19587 - [Frontend][TFLite] Support StableHLO region-based ops and multi-subgraph models
#19588 - Normalize negative concat axis in ReorderPermuteDimsAfterConcat
#19603 - [REFACTOR]Fold CalleeCollector into relax DeadCodeElimination
#19538 - [Frontend][TFLite] Support quantized TFLite import via QDQ decomposition
#19616 - [Frontend][TFLite] Support control-flow multi-subgraph operators
#19601 - [Frontend][TFLite] Add UNIDIRECTIONAL_SEQUENCE_RNN converter
#19637 - [Frontend][TFLite] Add REDUCE_WINDOW support
#19632 - [Frontend][TFLite] Add RNN converter
#19633 - [Frontend][TFLite] Add LSTM and SVDF converter
#19639 - [Frontend][TFLite] Add TFLite Resource Variable and Static Hashtable Import Support
#19634 - [Frontend][TFLite] Support sequence LSTM and RNN operators
#19646 - [Frontend][TFLite] Support STABLEHLO_WHILE
#19644 - [IR] Skip in-place multiply when two operands are views of the same tensor
#19649 - [Frontend][TFLite] Support STABLEHLO_CUSTOM_CALL
#19654 - [Frontend][TFLite] Add HASHTABLE_LOOKUP converter
#19651 - [Frontend][TFLite] Support STABLEHLO_RNG_BIT_GENERATOR
#19645 - [PyTorch] Cast non-bool inputs to bool in logical_not converter
#19652 - [Frontend][TFLite] Add EMBEDDING_LOOKUP_SPARSE converter
#19660 - [PyTorch] Decompose integer pow into repeated multiplication
#19626 - [ONNX] Fix Cast operator float->int NaN/Inf handling
#19674 - [ONNX] Preserve NaN in Sign to align with ONNX Runtime
#19679 - [PyTorch] Cast non-bool inputs to bool in logical_and converter
#19711 - [CoreML] Fix CoreML partition pass
#19732 - [PyTorch][DLight] Fix exported-program CUDA test failures
#19756 - [PyTorch] Add logical_or and logical_xor converters
#19772 - [ONNX] Fix LayerNormalization no-bias zero tensor shape and dtype
#19773 - [ONNX] Support exclusive option in CumSum
#19755 - [ONNX] Make ReduceMax/ReduceMin NaN propagation order-independent(numpy semantics)
#19789 - [TensorRT] Update TensorRT runtime to 10
#19763 - [Frontend][TFLite] Add support for FFT/complex operators: REAL, IMAG, COMPLEX_ABS

Runtime

#19617 - [CMAKE]Link tvm_rpc with all backend runtime libraries
#19620 - [REFACTOR]Phase out tvm::runtime::regex_match
#19622 - [REFACTOR]Remove leftover microTVM/CRT crumbs
#19621 - [REFACTOR]Relocate nvtx.h to tvm/support/cuda and make it header-only
#19628 - [REFACTOR]Structural reorganization: locality moves for thread_map, texture, minrpc, disco, contrib
#19714 - [Tests] Fix contrib wheel tests
#19736 - [Disco] Fix session attribute storage, NVSHMEM build, and test gating
#19748 - [Tests] Drop int4 from random_fill test, fix dtype error message
#19762 - [CoreML] Fix FFI casts in CoreML runtime

TIR

#19581 - [TIRx] Bringup TIRx Infrastructure
#19642 - [TIRx] Fix stale Simplify import in lowering test
#19657 - [TIRx] Post-bringup op-dispatch / codegen / TVMScript follow-ups
#19663 - [REFACTOR][TIRX] Consolidate split host device stages
#19677 - [TIRx] Update scoped ops and CUDA launch bounds
#19728 - [TIRx] Preserve Triton call_kernel compile options
#19739 - [TIRx] Use canonical PTX async script API in s_tir test
#19753 - [TIRX][Tests] Fix LLVM version gate for vectorized lround
#19757 - [TIRx] Post-bringup follow-ups: op-dispatch, namespaces, launch bounds, gemm-async, backend reorg
#19785 - [TIRX][CUDA] Framework support for FA4, CLC intrinsics, and nvfp4 tcgen05 GEMM
#19776 - [TIRx][RISC-V] Use scalable RVV loops for fixed vectorize
#19797 - [REFACTOR][TIRX] Add IntImm common scalar ctor and streamline MakeConst

TVMScript

#19583 - Handle undefined functions when dumping IRModule

cuda & cutlass & tensorrt

#19565 - [RFC][CodeGen][CUDA]: Gate fast math intrinsic lowering behind target option
#19596 - [CodeGen][CUDA] Move fast math intrinsic lowering option to PassContext
#19741 - [S-TIR][CUDA] Fix legacy predicated cp.async zero fill
#19768 - [REFACTOR][CUDA] Phase out l2 cache flush preproc test
#19770 - [REFACTOR][CUDA] Phase out cuda_common.h
#19784 - [CUDA] Narrow the cuda extra from cuda-python to cuda-bindings

web

#19494 - Add support for OPFS
#19569 - [COS] Persist URL→hash mapping across page loads
#19673 - Add support for OPFS synchronous access handles and committed records
#19687 - Bump tvmjs version to 0.25.0-dev1
#19790 - Destroy GPUDevice once on buffer creation error
#19780 - use singular requestFileHandle() instead of requestFileHandles()

Misc

#19446 - [release][Dont Squash] Update version to 0.24.0 and 0.25.0.dev on main branch
#19528 - [REFACTOR][IR] Remove dead AttrFunctor template
#19423 - [TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope
#19539 - [Contrib] Fix CUDA contrib build after FFI/header cleanups
#19594 - [BUILD] Modularize device runtime into per-backend DSOs
#19586 - [RPC][Tracker] Bound msg_size to MAX_TRACKER_MSG_BYTES to prevent unbounded buffer growth
#19597 - [IR] Add annotations to Call nodes
#19602 - Fix PytestUnknownMarkWarning: Unknown pytest.mark.adreno_clml
#19607 - [REFACTOR][IR] Cleanup attrs.h: drop NullValue, AttrsNodeReflAdapter, legacy BaseAttrsNode methods
#19611 - [REFACTOR] Move src/ir/script_printer.cc to src/script/printer/
#19613 - [REFACTOR][IR] Phase out src/ir/structural_{hash,equal}.cc to tvm-ffi
#19612 - [REFACTOR][IR] Inline ApplyPassToFunction into relax decompose_ops, delete the util
#19614 - [REFACTOR][IR] Phase out class Integer and class Bool in Attrs and PassConfig
#19615 - [REFACTOR][IR] attrs.h follow-up cleanup: drop legacy vtable / rename / phase out AttrFieldInfo
#19605 - [REFACTOR][TIR] Tie AnnotateDeviceRegions/SplitHostDevice/LowerDeviceKernelLaunch together
#19618 - [IR] Rename Call annotations to attrs
#19624 - [REFACTOR][PYTHON] Lift compiler/CLI/process modules from tvm.contrib to tvm.support
#19627 - [REFACTOR][IR][FFI] Bump tvm-ffi (+ SEqHashDef migration) and phase out tvm/ir/repr.h
#19625 - [REFACTOR][IR] Inline ReplaceGlobalVars into AttachGlobalSymbol
#19630 - [REFACTOR][PYTHON] Consolidate derived_object into tvm.ir.utils
#19631 - [REFACTOR][SCRIPT] tvmscript streamline: lift printer.h, restore one-way dep, migrate dialect config to extra_config
#19636 - [REFACTOR][IR] Delete class Bool and class Integer boxed-type wrappers
#19653 - [REFACTOR][PYTHON] Revisit lifted support modules from tvm.contrib
#19648 - fix: Security Patch: Fix missing exported flag in AndroidManifest
#19658 - [RPC] Import tvm.testing lazily in rpc.testing
#19662 - [FFI][IR] Route JSON serialization through tvm-ffi
#19661 - [FFI][REFACTOR] Direct structural APIs to tvm-ffi
#19681 - [Bump] tvm-ffi to 59da4c0
#19684 - [RELEASE] Bump web npm version to 0.25.0
#19701 - [Python] Bump apache-tvm-ffi floor to >=0.1.12 on v0.25.0
#19709 - [Refactor][Meta-schedule] Remove meta-schedule as_string mechanism in favor of default representation
#19719 - [REFACTOR][PYTHON] Slim tvm.libinfo to info-only helpers
#19717 - [Codegen][NVPTX] Skip runtime execution in Vulkan codegen tests
#19721 - [REFACTOR][PYTHON] Remove tvm.ffi shim; import tvm_ffi directly
#19722 - [REFACTOR][IR] Phase out diagnostic.h for visit-context-aware pass errors
#19723 - [Python] Refactor pyproject.toml dependencies
#19727 - [PYTHON] Autoload backends; simplify library loading; remove TVMError for native errors
#19742 - [S-TIR] Fix software pipeline offsets for legacy MMA intrinsics
#19758 - [REFACTOR][VM] Move CUDA graph VM builtin back under VM runtime
#19760 - [REFACTOR][DataType] Phase out target custom datatype support
#19759 - [REFACTOR][TARGET] Cleanup backend target registration
#19767 - [MetaScheduler] Improve print info about builder/runner state
#19778 - [CPP_RPC] Bugfix race conditions and enhance print infos
#19734 - [CMAKE] Upgrade TVM build baseline to C++20
#19769 - [REFACTOR][PYTHON] Consolidate backend autoload infra
#19781 - [REFACTOR][IR] Cleanup IR naming utilities
#19783 - [AGENT] Migrate agent instructions to vendor-neutral layout
#19794 - [REFACTOR] Phase out unused queue and rang license entries
#19799 - [REFACTOR][IR] Simplify CallingConv attribute access
#19805 - [CMAKE] Revert build baseline to C++17

apache/tvm v0.25.0 on GitHub