github apache/tvm v0.25.0

6 hours ago

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax, Frontend, TIR, Runtime, etc.

Please visit the full listing of commits for a complete view: v0.24.0...v0.25.0.

Community

None.

RFCs

None.

Arith

  • #19604 - [REFACTOR][TIR]Phase out ControlFlowGraph, NarrowPredicateExpression, and rename Simplify to StmtSimplify
  • #19638 - [REFACTOR]Phase out arith/scalable_expression; arith no longer proves over scalable vectors
  • #19670 - Memoize IntervalSet variable relaxation to avoid exponential blowup
  • #19669 - Gate canonical-simplify LT Case 2 on extra scale == +1
  • #19675 - Make Analyzer a tvm-ffi Object

BugFix

  • #19502 - [TIR] Skip bool-typed expressions in CSE
  • #19497 - [Relax] Fix scatter_elements and scatter_nd CUDA compilation
  • #19498 - [Relax][ONNX] Resolve param Vars in Concat to handle mixed Shape/Tensor inputs
  • #19511 - [Relax][Torch] Honor multi-axis dims in torch.flip converter
  • #19512 - [Relax][Torch] Honor correction in std/var converter
  • #19514 - [S-TIR] Wrap bare scalar bodies in DefaultGPUSchedule to avoid root-block crash
  • #19527 - [Relax]: handle ONNX ScatterElements reduction
  • #19535 - [Fix][Relax]: ONNX Clip NaN bounds and preserve input NaN (ORT parity)
  • #19554 - [Fix][CI]: remove astral-sh/setup-uv from lint workflow
  • #19557 - [Fix][Relax] Lower bool prod as logical all
  • #19567 - [Target][LLVM] Use libm for asin/acos instead of buggy inline Taylor
  • #19568 - [Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern
  • #19619 - [Vulkan][CodeGen] Change OpControlBarrier to AcquireRelease
  • #19643 - [Fix] Stabilize layer_norm variance computation with two-pass reduction
  • #19650 - [Fix][Relax] Support ND batched matmul chains in AdjustMatmulOrder pass
  • #19683 - [Fix] CommReduce could handle 0-dim data
  • #19779 - [Fix] nn.attention support dynamic batch_size
  • #19808 - [Fix] Revert C++20-only lambda captures for C++17 build

CI

  • #19629 - Remove tvm-lint from tvm-bot
  • #19656 - Add cibw-based wheel publishing to PyPI
  • #19659 - Wheel publishing follow-ups
  • #19665 - Derive the version from Git tags via setuptools_scm
  • #19664 - Reformat the macOS repair-wheel-command as a multiline script
  • #19697 - Target apache-tvm for PyPI wheel publishing
  • #19775 - Merge PR against its target branch instead of main (#19712)
  • #19685 - Remove PyPI-only tag ref guard from wheel publishing
  • #19703 - Pin actions by version tag, trim wheel perms
  • #19706 - [Tests] Fix s_tir tests using removed T.block API in TIRx script
  • #19700 - Fix release verification script
  • #19704 - [Tests] Skip test modules cleanly when optional deps are missing
  • #19713 - Fix CI script test subprocess environment
  • #19724 - [Tests][Disco] Skip CCL tests when runtime support is absent
  • #19725 - [Tests][Relax] Gate multi-GPU VM test on three devices
  • #19726 - [Tests][Hexagon] Lazily import pytest plugin dependencies
  • #19730 - [Tests][NNAPI] Skip tests cleanly when remote environment is unavailable
  • #19729 - [Tests][S-TIR] Fix stale MetaSchedule sketch expectations and migrate let binds to T.let
  • #19715 - [Tests] Remove test_runtime_ndarray (covered by tvm-ffi)
  • #19731 - [Script][Tests] Fix dialect redirect module re-execution and stray category-less tirx.intrin_test op
  • #19735 - [S-TIR][Tests] Fix transform test failures after TIRx bringup
  • #19740 - [Tests] Check WebGPU volatile allreduce annotation structurally
  • #19746 - [Tests] Fix flaky popen pool executor test
  • #19738 - Align cuda-python with PyTorch cuda-bindings
  • #19745 - [Tests][LLVM] Gate stepvector intrinsic rename on LLVM 20
  • #19751 - [S-TIR][Tests] Mark test_cp_async_in_if_then_else as xfail
  • #19737 - Run s_tir/transform tests in the python-unittest stage
  • #19754 - Updated cibw to 4.1.0
  • #19752 - [Tests][AArch64] Make SVE codegen assertions robust across LLVM versions
  • #19761 - Drop redundant cmake/ninja install from the Linux wheel CUDA sidecar
  • #19777 - [Tests] Modernize test gating
  • #19786 - [Tests] Make TargetCreation.DeduplicateKeys host-agnostic on AArch64
  • #19787 - [Tests] Replace remaining requires_* helpers with standard pytest
  • #19793 - Pin GitHub Actions to SHA for ASF INFRA compliance
  • #19798 - Remove Jenkins PR linter step
  • #19800 - [Tests][Refactor] Remove unused testing helpers

Docs

  • #19606 - Reorganize development guide content
  • #19720 - Clarify loading serialized artifacts requires a trusted source
  • #19782 - [CI] Bump tlcpack-sphinx-addon to restore search result summaries
  • #19788 - Modernize test-gating documentation

Frontend

  • #19590 - [ONNX] Add RMSNormalization converter for ONNX opset 23

Hexagon

  • #19747 - [Tests] Clean up stale hexagon tests
  • #19796 - [REFACTOR]Phase out Hexagon app and test wrappers

LLVM

  • #19716 - [Codegen]Accept splat form in VLA broadcast test
  • #19744 - [Codegen][Tests] Gate +v9a vscale_range expectation on LLVM version

Relax

  • #19495 - [Frontend] Add ParameterList and ParameterDict containers
  • #19491 - [Frontend][TFLite] Add segment operator mappings
  • #19499 - [Frontend][TFLite] Add tests coverage for SPACE_TO_BATCH_ND and BATCH_TO_SPACE_ND
  • #19516 - [TFLite] Add gather frontend expected IRModule tests
  • #19488 - [PyTorch] Fix segfault in from_exported_program when model uses index_put_ with tuple output
  • #19523 - [Frontend][TFLite] Add Conv3D support
  • #19525 - [ONNX] Normalize negative indices before the take call for Gather operator
  • #19530 - [Frontend] Add TFLite Frontend Support for CONV_3D_TRANSPOSE
  • #19536 - [Frontend][TFLite] Add initial StableHLO builtin operator support
  • #19547 - [ONNX] Set max_output_boxes_per_class default value to 0 for NonMaxSuppression
  • #19515 - [ONNX] Add ONNX Backend Tests for systematic frontend coverage
  • #19566 - [ONNX] Prevent Div divide-by-zero crashes
  • #19573 - [ONNX] Fix TopK scalar K extraction in from_onnx
  • #19587 - [Frontend][TFLite] Support StableHLO region-based ops and multi-subgraph models
  • #19588 - Normalize negative concat axis in ReorderPermuteDimsAfterConcat
  • #19603 - [REFACTOR]Fold CalleeCollector into relax DeadCodeElimination
  • #19538 - [Frontend][TFLite] Support quantized TFLite import via QDQ decomposition
  • #19616 - [Frontend][TFLite] Support control-flow multi-subgraph operators
  • #19601 - [Frontend][TFLite] Add UNIDIRECTIONAL_SEQUENCE_RNN converter
  • #19637 - [Frontend][TFLite] Add REDUCE_WINDOW support
  • #19632 - [Frontend][TFLite] Add RNN converter
  • #19633 - [Frontend][TFLite] Add LSTM and SVDF converter
  • #19639 - [Frontend][TFLite] Add TFLite Resource Variable and Static Hashtable Import Support
  • #19634 - [Frontend][TFLite] Support sequence LSTM and RNN operators
  • #19646 - [Frontend][TFLite] Support STABLEHLO_WHILE
  • #19644 - [IR] Skip in-place multiply when two operands are views of the same tensor
  • #19649 - [Frontend][TFLite] Support STABLEHLO_CUSTOM_CALL
  • #19654 - [Frontend][TFLite] Add HASHTABLE_LOOKUP converter
  • #19651 - [Frontend][TFLite] Support STABLEHLO_RNG_BIT_GENERATOR
  • #19645 - [PyTorch] Cast non-bool inputs to bool in logical_not converter
  • #19652 - [Frontend][TFLite] Add EMBEDDING_LOOKUP_SPARSE converter
  • #19660 - [PyTorch] Decompose integer pow into repeated multiplication
  • #19626 - [ONNX] Fix Cast operator float->int NaN/Inf handling
  • #19674 - [ONNX] Preserve NaN in Sign to align with ONNX Runtime
  • #19679 - [PyTorch] Cast non-bool inputs to bool in logical_and converter
  • #19711 - [CoreML] Fix CoreML partition pass
  • #19732 - [PyTorch][DLight] Fix exported-program CUDA test failures
  • #19756 - [PyTorch] Add logical_or and logical_xor converters
  • #19772 - [ONNX] Fix LayerNormalization no-bias zero tensor shape and dtype
  • #19773 - [ONNX] Support exclusive option in CumSum
  • #19755 - [ONNX] Make ReduceMax/ReduceMin NaN propagation order-independent(numpy semantics)
  • #19789 - [TensorRT] Update TensorRT runtime to 10
  • #19763 - [Frontend][TFLite] Add support for FFT/complex operators: REAL, IMAG, COMPLEX_ABS

Runtime

  • #19617 - [CMAKE]Link tvm_rpc with all backend runtime libraries
  • #19620 - [REFACTOR]Phase out tvm::runtime::regex_match
  • #19622 - [REFACTOR]Remove leftover microTVM/CRT crumbs
  • #19621 - [REFACTOR]Relocate nvtx.h to tvm/support/cuda and make it header-only
  • #19628 - [REFACTOR]Structural reorganization: locality moves for thread_map, texture, minrpc, disco, contrib
  • #19714 - [Tests] Fix contrib wheel tests
  • #19736 - [Disco] Fix session attribute storage, NVSHMEM build, and test gating
  • #19748 - [Tests] Drop int4 from random_fill test, fix dtype error message
  • #19762 - [CoreML] Fix FFI casts in CoreML runtime

TIR

  • #19581 - [TIRx] Bringup TIRx Infrastructure
  • #19642 - [TIRx] Fix stale Simplify import in lowering test
  • #19657 - [TIRx] Post-bringup op-dispatch / codegen / TVMScript follow-ups
  • #19663 - [REFACTOR][TIRX] Consolidate split host device stages
  • #19677 - [TIRx] Update scoped ops and CUDA launch bounds
  • #19728 - [TIRx] Preserve Triton call_kernel compile options
  • #19739 - [TIRx] Use canonical PTX async script API in s_tir test
  • #19753 - [TIRX][Tests] Fix LLVM version gate for vectorized lround
  • #19757 - [TIRx] Post-bringup follow-ups: op-dispatch, namespaces, launch bounds, gemm-async, backend reorg
  • #19785 - [TIRX][CUDA] Framework support for FA4, CLC intrinsics, and nvfp4 tcgen05 GEMM
  • #19776 - [TIRx][RISC-V] Use scalable RVV loops for fixed vectorize
  • #19797 - [REFACTOR][TIRX] Add IntImm common scalar ctor and streamline MakeConst

TVMScript

  • #19583 - Handle undefined functions when dumping IRModule

cuda & cutlass & tensorrt

  • #19565 - [RFC][CodeGen][CUDA]: Gate fast math intrinsic lowering behind target option
  • #19596 - [CodeGen][CUDA] Move fast math intrinsic lowering option to PassContext
  • #19741 - [S-TIR][CUDA] Fix legacy predicated cp.async zero fill
  • #19768 - [REFACTOR][CUDA] Phase out l2 cache flush preproc test
  • #19770 - [REFACTOR][CUDA] Phase out cuda_common.h
  • #19784 - [CUDA] Narrow the cuda extra from cuda-python to cuda-bindings

web

  • #19494 - Add support for OPFS
  • #19569 - [COS] Persist URL→hash mapping across page loads
  • #19673 - Add support for OPFS synchronous access handles and committed records
  • #19687 - Bump tvmjs version to 0.25.0-dev1
  • #19790 - Destroy GPUDevice once on buffer creation error
  • #19780 - use singular requestFileHandle() instead of requestFileHandles()

Misc

  • #19446 - [release][Dont Squash] Update version to 0.24.0 and 0.25.0.dev on main branch
  • #19528 - [REFACTOR][IR] Remove dead AttrFunctor template
  • #19423 - [TIR] Add cooperative_tensor builtins and metal.cooperative_tensor storage scope
  • #19539 - [Contrib] Fix CUDA contrib build after FFI/header cleanups
  • #19594 - [BUILD] Modularize device runtime into per-backend DSOs
  • #19586 - [RPC][Tracker] Bound msg_size to MAX_TRACKER_MSG_BYTES to prevent unbounded buffer growth
  • #19597 - [IR] Add annotations to Call nodes
  • #19602 - Fix PytestUnknownMarkWarning: Unknown pytest.mark.adreno_clml
  • #19607 - [REFACTOR][IR] Cleanup attrs.h: drop NullValue, AttrsNodeReflAdapter, legacy BaseAttrsNode methods
  • #19611 - [REFACTOR] Move src/ir/script_printer.cc to src/script/printer/
  • #19613 - [REFACTOR][IR] Phase out src/ir/structural_{hash,equal}.cc to tvm-ffi
  • #19612 - [REFACTOR][IR] Inline ApplyPassToFunction into relax decompose_ops, delete the util
  • #19614 - [REFACTOR][IR] Phase out class Integer and class Bool in Attrs and PassConfig
  • #19615 - [REFACTOR][IR] attrs.h follow-up cleanup: drop legacy vtable / rename / phase out AttrFieldInfo
  • #19605 - [REFACTOR][TIR] Tie AnnotateDeviceRegions/SplitHostDevice/LowerDeviceKernelLaunch together
  • #19618 - [IR] Rename Call annotations to attrs
  • #19624 - [REFACTOR][PYTHON] Lift compiler/CLI/process modules from tvm.contrib to tvm.support
  • #19627 - [REFACTOR][IR][FFI] Bump tvm-ffi (+ SEqHashDef migration) and phase out tvm/ir/repr.h
  • #19625 - [REFACTOR][IR] Inline ReplaceGlobalVars into AttachGlobalSymbol
  • #19630 - [REFACTOR][PYTHON] Consolidate derived_object into tvm.ir.utils
  • #19631 - [REFACTOR][SCRIPT] tvmscript streamline: lift printer.h, restore one-way dep, migrate dialect config to extra_config
  • #19636 - [REFACTOR][IR] Delete class Bool and class Integer boxed-type wrappers
  • #19653 - [REFACTOR][PYTHON] Revisit lifted support modules from tvm.contrib
  • #19648 - fix: Security Patch: Fix missing exported flag in AndroidManifest
  • #19658 - [RPC] Import tvm.testing lazily in rpc.testing
  • #19662 - [FFI][IR] Route JSON serialization through tvm-ffi
  • #19661 - [FFI][REFACTOR] Direct structural APIs to tvm-ffi
  • #19681 - [Bump] tvm-ffi to 59da4c0
  • #19684 - [RELEASE] Bump web npm version to 0.25.0
  • #19701 - [Python] Bump apache-tvm-ffi floor to >=0.1.12 on v0.25.0
  • #19709 - [Refactor][Meta-schedule] Remove meta-schedule as_string mechanism in favor of default representation
  • #19719 - [REFACTOR][PYTHON] Slim tvm.libinfo to info-only helpers
  • #19717 - [Codegen][NVPTX] Skip runtime execution in Vulkan codegen tests
  • #19721 - [REFACTOR][PYTHON] Remove tvm.ffi shim; import tvm_ffi directly
  • #19722 - [REFACTOR][IR] Phase out diagnostic.h for visit-context-aware pass errors
  • #19723 - [Python] Refactor pyproject.toml dependencies
  • #19727 - [PYTHON] Autoload backends; simplify library loading; remove TVMError for native errors
  • #19742 - [S-TIR] Fix software pipeline offsets for legacy MMA intrinsics
  • #19758 - [REFACTOR][VM] Move CUDA graph VM builtin back under VM runtime
  • #19760 - [REFACTOR][DataType] Phase out target custom datatype support
  • #19759 - [REFACTOR][TARGET] Cleanup backend target registration
  • #19767 - [MetaScheduler] Improve print info about builder/runner state
  • #19778 - [CPP_RPC] Bugfix race conditions and enhance print infos
  • #19734 - [CMAKE] Upgrade TVM build baseline to C++20
  • #19769 - [REFACTOR][PYTHON] Consolidate backend autoload infra
  • #19781 - [REFACTOR][IR] Cleanup IR naming utilities
  • #19783 - [AGENT] Migrate agent instructions to vendor-neutral layout
  • #19794 - [REFACTOR] Phase out unused queue and rang license entries
  • #19799 - [REFACTOR][IR] Simplify CallingConv attribute access
  • #19805 - [CMAKE] Revert build baseline to C++17

Don't miss a new tvm release

NewReleases is sending notifications on new releases.