Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress): Relax, Frontend, TIR, Runtime, etc.
Please visit the full listing of commits for a complete view: v0.24.0...v0.25.0.
Community
None.
RFCs
None.
Arith
- #19604 - [REFACTOR][TIR]Phase out ControlFlowGraph, NarrowPredicateExpression, and rename Simplify to StmtSimplify
- #19638 - [REFACTOR]Phase out arith/scalable_expression; arith no longer proves over scalable vectors
- #19670 - Memoize IntervalSet variable relaxation to avoid exponential blowup
- #19669 - Gate canonical-simplify LT Case 2 on extra scale == +1
- #19675 - Make Analyzer a tvm-ffi Object
BugFix
- #19502 - [TIR] Skip bool-typed expressions in CSE
- #19497 - [Relax] Fix scatter_elements and scatter_nd CUDA compilation
- #19498 - [Relax][ONNX] Resolve param Vars in Concat to handle mixed Shape/Tensor inputs
- #19511 - [Relax][Torch] Honor multi-axis dims in torch.flip converter
- #19512 - [Relax][Torch] Honor
correctionin std/var converter - #19514 - [S-TIR] Wrap bare scalar bodies in DefaultGPUSchedule to avoid root-block crash
- #19527 - [Relax]: handle ONNX ScatterElements reduction
- #19535 - [Fix][Relax]: ONNX Clip NaN bounds and preserve input NaN (ORT parity)
- #19554 - [Fix][CI]: remove astral-sh/setup-uv from lint workflow
- #19557 - [Fix][Relax] Lower bool prod as logical all
- #19567 - [Target][LLVM] Use libm for asin/acos instead of buggy inline Taylor
- #19568 - [Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern
- #19619 - [Vulkan][CodeGen] Change OpControlBarrier to AcquireRelease
- #19643 - [Fix] Stabilize layer_norm variance computation with two-pass reduction
- #19650 - [Fix][Relax] Support ND batched matmul chains in AdjustMatmulOrder pass
- #19683 - [Fix] CommReduce could handle 0-dim data
- #19779 - [Fix] nn.attention support dynamic batch_size
- #19808 - [Fix] Revert C++20-only lambda captures for C++17 build
CI
- #19629 - Remove tvm-lint from tvm-bot
- #19656 - Add cibw-based wheel publishing to PyPI
- #19659 - Wheel publishing follow-ups
- #19665 - Derive the version from Git tags via setuptools_scm
- #19664 - Reformat the macOS repair-wheel-command as a multiline script
- #19697 - Target apache-tvm for PyPI wheel publishing
- #19775 - Merge PR against its target branch instead of main (#19712)
- #19685 - Remove PyPI-only tag ref guard from wheel publishing
- #19703 - Pin actions by version tag, trim wheel perms
- #19706 - [Tests] Fix s_tir tests using removed T.block API in TIRx script
- #19700 - Fix release verification script
- #19704 - [Tests] Skip test modules cleanly when optional deps are missing
- #19713 - Fix CI script test subprocess environment
- #19724 - [Tests][Disco] Skip CCL tests when runtime support is absent
- #19725 - [Tests][Relax] Gate multi-GPU VM test on three devices
- #19726 - [Tests][Hexagon] Lazily import pytest plugin dependencies
- #19730 - [Tests][NNAPI] Skip tests cleanly when remote environment is unavailable
- #19729 - [Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let
- #19715 - [Tests] Remove test_runtime_ndarray (covered by tvm-ffi)
- #19731 - [Script][Tests] Fix dialect redirect module re-execution and stray category-less tirx.intrin_test op
- #19735 - [S-TIR][Tests] Fix transform test failures after TIRx bringup
- #19740 - [Tests] Check WebGPU volatile allreduce annotation structurally
- #19746 - [Tests] Fix flaky popen pool executor test
- #19738 - Align cuda-python with PyTorch cuda-bindings
- #19745 - [Tests][LLVM] Gate stepvector intrinsic rename on LLVM 20
- #19751 - [S-TIR][Tests] Mark test_cp_async_in_if_then_else as xfail
- #19737 - Run s_tir/transform tests in the python-unittest stage
- #19754 - Updated cibw to 4.1.0
- #19752 - [Tests][AArch64] Make SVE codegen assertions robust across LLVM versions
- #19761 - Drop redundant cmake/ninja install from the Linux wheel CUDA sidecar
- #19777 - [Tests] Modernize test gating
- #19786 - [Tests] Make TargetCreation.DeduplicateKeys host-agnostic on AArch64
- #19787 - [Tests] Replace remaining requires_* helpers with standard pytest
- #19793 - Pin GitHub Actions to SHA for ASF INFRA compliance
- #19798 - Remove Jenkins PR linter step
- #19800 - [Tests][Refactor] Remove unused testing helpers
Docs
- #19606 - Reorganize development guide content
- #19720 - Clarify loading serialized artifacts requires a trusted source
- #19782 - [CI] Bump tlcpack-sphinx-addon to restore search result summaries
- #19788 - Modernize test-gating documentation
Frontend
- #19590 - [ONNX] Add RMSNormalization converter for ONNX opset 23
Hexagon
- #19747 - [Tests] Clean up stale hexagon tests
- #19796 - [REFACTOR]Phase out Hexagon app and test wrappers
LLVM
- #19716 - [Codegen]Accept splat form in VLA broadcast test
- #19744 - [Codegen][Tests] Gate +v9a vscale_range expectation on LLVM version
Relax
- #19495 - [Frontend] Add ParameterList and ParameterDict containers
- #19491 - [Frontend][TFLite] Add segment operator mappings
- #19499 - [Frontend][TFLite] Add tests coverage for SPACE_TO_BATCH_ND and BATCH_TO_SPACE_ND
- #19516 - [TFLite] Add gather frontend expected IRModule tests
- #19488 - [PyTorch] Fix segfault in from_exported_program when model uses index_put_ with tuple output
- #19523 - [Frontend][TFLite] Add Conv3D support
- #19525 - [ONNX] Normalize negative indices before the take call for
Gatheroperator - #19530 - [Frontend] Add TFLite Frontend Support for CONV_3D_TRANSPOSE
- #19536 - [Frontend][TFLite] Add initial StableHLO builtin operator support
- #19547 - [ONNX] Set
max_output_boxes_per_classdefault value to 0 for NonMaxSuppression - #19515 - [ONNX] Add ONNX Backend Tests for systematic frontend coverage
- #19566 - [ONNX] Prevent
Divdivide-by-zero crashes - #19573 - [ONNX] Fix TopK scalar K extraction in from_onnx
- #19587 - [Frontend][TFLite] Support StableHLO region-based ops and multi-subgraph models
- #19588 - Normalize negative concat axis in ReorderPermuteDimsAfterConcat
- #19603 - [REFACTOR]Fold CalleeCollector into relax DeadCodeElimination
- #19538 - [Frontend][TFLite] Support quantized TFLite import via QDQ decomposition
- #19616 - [Frontend][TFLite] Support control-flow multi-subgraph operators
- #19601 - [Frontend][TFLite] Add UNIDIRECTIONAL_SEQUENCE_RNN converter
- #19637 - [Frontend][TFLite] Add REDUCE_WINDOW support
- #19632 - [Frontend][TFLite] Add RNN converter
- #19633 - [Frontend][TFLite] Add LSTM and SVDF converter
- #19639 - [Frontend][TFLite] Add TFLite Resource Variable and Static Hashtable Import Support
- #19634 - [Frontend][TFLite] Support sequence LSTM and RNN operators
- #19646 - [Frontend][TFLite] Support STABLEHLO_WHILE
- #19644 - [IR] Skip in-place multiply when two operands are views of the same tensor
- #19649 - [Frontend][TFLite] Support STABLEHLO_CUSTOM_CALL
- #19654 - [Frontend][TFLite] Add HASHTABLE_LOOKUP converter
- #19651 - [Frontend][TFLite] Support STABLEHLO_RNG_BIT_GENERATOR
- #19645 - [PyTorch] Cast non-bool inputs to bool in logical_not converter
- #19652 - [Frontend][TFLite] Add EMBEDDING_LOOKUP_SPARSE converter
- #19660 - [PyTorch] Decompose integer pow into repeated multiplication
- #19626 - [ONNX] Fix Cast operator float->int NaN/Inf handling
- #19674 - [ONNX] Preserve NaN in Sign to align with ONNX Runtime
- #19679 - [PyTorch] Cast non-bool inputs to bool in logical_and converter
- #19711 - [CoreML] Fix CoreML partition pass
- #19732 - [PyTorch][DLight] Fix exported-program CUDA test failures
- #19756 - [PyTorch] Add logical_or and logical_xor converters
- #19772 - [ONNX] Fix LayerNormalization no-bias zero tensor shape and dtype
- #19773 - [ONNX] Support exclusive option in CumSum
- #19755 - [ONNX] Make ReduceMax/ReduceMin NaN propagation order-independent(numpy semantics)
- #19789 - [TensorRT] Update TensorRT runtime to 10
- #19763 - [Frontend][TFLite] Add support for FFT/complex operators: REAL, IMAG, COMPLEX_ABS
Runtime
- #19617 - [CMAKE]Link tvm_rpc with all backend runtime libraries
- #19620 - [REFACTOR]Phase out tvm::runtime::regex_match
- #19622 - [REFACTOR]Remove leftover microTVM/CRT crumbs
- #19621 - [REFACTOR]Relocate nvtx.h to tvm/support/cuda and make it header-only
- #19628 - [REFACTOR]Structural reorganization: locality moves for thread_map, texture, minrpc, disco, contrib
- #19714 - [Tests] Fix contrib wheel tests
- #19736 - [Disco] Fix session attribute storage, NVSHMEM build, and test gating
- #19748 - [Tests] Drop int4 from random_fill test, fix dtype error message
- #19762 - [CoreML] Fix FFI casts in CoreML runtime
TIR
- #19581 - [TIRx] Bringup TIRx Infrastructure
- #19642 - [TIRx] Fix stale Simplify import in lowering test
- #19657 - [TIRx] Post-bringup op-dispatch / codegen / TVMScript follow-ups
- #19663 - [REFACTOR][TIRX] Consolidate split host device stages
- #19677 - [TIRx] Update scoped ops and CUDA launch bounds
- #19728 - [TIRx] Preserve Triton call_kernel compile options
- #19739 - [TIRx] Use canonical PTX async script API in s_tir test
- #19753 - [TIRX][Tests] Fix LLVM version gate for vectorized lround
- #19757 - [TIRx] Post-bringup follow-ups: op-dispatch, namespaces, launch bounds, gemm-async, backend reorg
- #19785 - [TIRX][CUDA] Framework support for FA4, CLC intrinsics, and nvfp4 tcgen05 GEMM
- #19776 - [TIRx][RISC-V] Use scalable RVV loops for fixed vectorize
- #19797 - [REFACTOR][TIRX] Add IntImm common scalar ctor and streamline MakeConst
TVMScript
- #19583 - Handle undefined functions when dumping IRModule
cuda & cutlass & tensorrt
- #19565 - [RFC][CodeGen][CUDA]: Gate fast math intrinsic lowering behind target option
- #19596 - [CodeGen][CUDA] Move fast math intrinsic lowering option to PassContext
- #19741 - [S-TIR][CUDA] Fix legacy predicated cp.async zero fill
- #19768 - [REFACTOR][CUDA] Phase out l2 cache flush preproc test
- #19770 - [REFACTOR][CUDA] Phase out cuda_common.h
- #19784 - [CUDA] Narrow the cuda extra from cuda-python to cuda-bindings
web
- #19494 - Add support for OPFS
- #19569 - [COS] Persist URL→hash mapping across page loads
- #19673 - Add support for OPFS synchronous access handles and committed records
- #19687 - Bump tvmjs version to 0.25.0-dev1
- #19790 - Destroy GPUDevice once on buffer creation error
- #19780 - use singular requestFileHandle() instead of requestFileHandles()
Misc
- #19446 - [release][Dont Squash] Update version to 0.24.0 and 0.25.0.dev on main branch
- #19528 - [REFACTOR][IR] Remove dead AttrFunctor template
- #19423 - [TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope
- #19539 - [Contrib] Fix CUDA contrib build after FFI/header cleanups
- #19594 - [BUILD] Modularize device runtime into per-backend DSOs
- #19586 - [RPC][Tracker] Bound msg_size to MAX_TRACKER_MSG_BYTES to prevent unbounded buffer growth
- #19597 - [IR] Add annotations to Call nodes
- #19602 - Fix PytestUnknownMarkWarning: Unknown pytest.mark.adreno_clml
- #19607 - [REFACTOR][IR] Cleanup attrs.h: drop NullValue, AttrsNodeReflAdapter, legacy BaseAttrsNode methods
- #19611 - [REFACTOR] Move src/ir/script_printer.cc to src/script/printer/
- #19613 - [REFACTOR][IR] Phase out src/ir/structural_{hash,equal}.cc to tvm-ffi
- #19612 - [REFACTOR][IR] Inline ApplyPassToFunction into relax decompose_ops, delete the util
- #19614 - [REFACTOR][IR] Phase out class Integer and class Bool in Attrs and PassConfig
- #19615 - [REFACTOR][IR] attrs.h follow-up cleanup: drop legacy vtable / rename / phase out AttrFieldInfo
- #19605 - [REFACTOR][TIR] Tie AnnotateDeviceRegions/SplitHostDevice/LowerDeviceKernelLaunch together
- #19618 - [IR] Rename Call annotations to attrs
- #19624 - [REFACTOR][PYTHON] Lift compiler/CLI/process modules from tvm.contrib to tvm.support
- #19627 - [REFACTOR][IR][FFI] Bump tvm-ffi (+ SEqHashDef migration) and phase out tvm/ir/repr.h
- #19625 - [REFACTOR][IR] Inline ReplaceGlobalVars into AttachGlobalSymbol
- #19630 - [REFACTOR][PYTHON] Consolidate derived_object into tvm.ir.utils
- #19631 - [REFACTOR][SCRIPT] tvmscript streamline: lift printer.h, restore one-way dep, migrate dialect config to extra_config
- #19636 - [REFACTOR][IR] Delete class Bool and class Integer boxed-type wrappers
- #19653 - [REFACTOR][PYTHON] Revisit lifted support modules from tvm.contrib
- #19648 - fix: Security Patch: Fix missing exported flag in AndroidManifest
- #19658 - [RPC] Import tvm.testing lazily in rpc.testing
- #19662 - [FFI][IR] Route JSON serialization through tvm-ffi
- #19661 - [FFI][REFACTOR] Direct structural APIs to tvm-ffi
- #19681 - [Bump] tvm-ffi to 59da4c0
- #19684 - [RELEASE] Bump web npm version to 0.25.0
- #19701 - [Python] Bump apache-tvm-ffi floor to >=0.1.12 on v0.25.0
- #19709 - [Refactor][Meta-schedule] Remove meta-schedule as_string mechanism in favor of default representation
- #19719 - [REFACTOR][PYTHON] Slim tvm.libinfo to info-only helpers
- #19717 - [Codegen][NVPTX] Skip runtime execution in Vulkan codegen tests
- #19721 - [REFACTOR][PYTHON] Remove tvm.ffi shim; import tvm_ffi directly
- #19722 - [REFACTOR][IR] Phase out diagnostic.h for visit-context-aware pass errors
- #19723 - [Python] Refactor pyproject.toml dependencies
- #19727 - [PYTHON] Autoload backends; simplify library loading; remove TVMError for native errors
- #19742 - [S-TIR] Fix software pipeline offsets for legacy MMA intrinsics
- #19758 - [REFACTOR][VM] Move CUDA graph VM builtin back under VM runtime
- #19760 - [REFACTOR][DataType] Phase out target custom datatype support
- #19759 - [REFACTOR][TARGET] Cleanup backend target registration
- #19767 - [MetaScheduler] Improve print info about builder/runner state
- #19778 - [CPP_RPC] Bugfix race conditions and enhance print infos
- #19734 - [CMAKE] Upgrade TVM build baseline to C++20
- #19769 - [REFACTOR][PYTHON] Consolidate backend autoload infra
- #19781 - [REFACTOR][IR] Cleanup IR naming utilities
- #19783 - [AGENT] Migrate agent instructions to vendor-neutral layout
- #19794 - [REFACTOR] Phase out unused queue and rang license entries
- #19799 - [REFACTOR][IR] Simplify CallingConv attribute access
- #19805 - [CMAKE] Revert build baseline to C++17