Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), FFI etc.

Please visit the full listing of commits for a complete view: v0.22.dev0...v0.22.0.rc0.

Community

None.

RFCs

None.

BugFix

#18352 - [Fix] Update ShapeView use in nccl.cc
#18324 - Fixing binding for bert
#18296 - [Fix] Add libxml2 dependency to fix Windows CI build failure
#18294 - [Fix] Set DRefObj and CUDAIPCMemoryObj as mutable
#18285 - [FFI]Enable load_inline on macos
#18287 - [Hotfix] Fix the conflicts about ffi-related updated names
#18281 - [FFI]Fix bug of ffi.cpp.load_inline on Windows
#18262 - [NNAPI] Use kind() instead of type_key() after FFI refactor
#18244 - [Fix] Update FlashInfer JIT header lookup
#18237 - [FFI]Fix type_traits on DataType after SmallStr update
#18232 - [LLVM][Fix] Do not emit debuginfo on vscale or other unknown types
#18219 - [Fix] Resolve deadlock in PopenPoolExecutor and LocalBuilder
#18207 - [Fix][ONNX] No precision widening for numpy binary operations
#18209 - [ONNX][FRONTEND][Fix] Update Resize to accept ShapeExpr
#18210 - [Bug] Fix core dump in InferLayoutRMSNorm and fix typo
#18208 - [FFI][Fix] Update datatype registry calls to the new paths
#18190 - [Fix] Codegen fix for relax cutlass
#18170 - [Fix] Fix the wrong check for tuple node in #18163
#18174 - [Misc]Fix missing PadAttrs register in op_attrs.py
#18158 - Fix NCCL build with GlobalDef registration
#18140 - [NNAPI] Fix type mismatch and test_mean annotation
#18138 - [Fix][ONNX] Fixed constant ROI handling in resize2d when loading onnx models
#18137 - [Fix][ONNX] Fix CumSum conversion when loading ONNX model

CI

#18245 - [LLVM][MSWIN]Fix LLVM module build with latest CI update
#18227 - Exit the build for AbortException
#18145 - [Test] Use roi_list variable instead of hardcoded values in ROI tensor creation

Docs

#18279 - [FFI]Initial bringup of cpp docs
#18264 - Misc docs fix
#18263 - [FFI]Initial docs scaffolding
#18261 - [FFI]Add missing files in packaging example
#18256 - [FFI]Wheel Packaging
#18128 - [Doc] Visualize the architecture using a UML sequence diagram

Frontend

#18143 - [ONNX] Extend axes for layer_norm when gamma/beta are multi-dimensional

LLVM

#18204 - Fixes up to the latest LLVM21
#18202 - [CPPTEST] Small fixes for LLVM >= 20

MetaSchedule

#18243 - [LLVM]Add RISCV V-extension v1.0 kernels to metaschedule

Metal

#18290 - Fix MetalModuleCreate
#18283 - [Fix]Fix type for device array in Metal API

ROCm

#18225 - Minor fixes for latest refactor

FFI

#18375 - [TE] [FFI] Fix broken axis/reduce_axis properties in BaseComputeOp and ScanOp after FFI refactoring
#18376 - [FFI] Bump tvm-ffi to 0.1.0rc2
#18370 - [FFI] Bump tvm-ffi dependency
#18354 - [FFI][ABI] Bump tvm-ffi to latest
#18349 - [FFI][ABI] Bump tvm-ffi to latest
#18345 - [FFI][ABI] Bump tvm-ffi version to reflect RC ABI Update
#18332 - [FFI][ABI] Bump version ffi to latest
#18314 - [REFACTOR][FFI] Split tvm-ffi into a separate repo
#18312 - [FFI][REFACTOR] Update TVM_FFI_STATIC_INIT_BLOCK to fn style
#18311 - [FFI][ABI] Better String and Nested Container handling
#18308 - [FFI][ABI] Refactor the naming of DLPack speed converter
#18307 - [FFI] Update load_inline interface
#18306 - [FFI][ABI][REFACTOR] Enhance DLPack Exchange Speed and Behavior
#18302 - [FFI][REFACTOR] Refactor python ffi call mechanism for perf
#18298 - [FFI] Fix system library symbol lookup
#18297 - [FFI] Temp skip windows tests
#18295 - [FFI][ABI] Introduce generic stream exchange protocol
#18289 - [FFI][REFACTOR] Streamline Object Declare Macros
#18284 - [FFI][REFACTOR] Introduce UnsafeInit and enhance ObjectRef null safety
#18282 - [FFI] Relax default alignment and continguous requirement
#18280 - [FFI][REFACTOR] Cleanup namespace
#18278 - [FFI] Temp skip load_inline tests nonlinux
#18277 - [FFI][REFACTOR] Cleanup tvm_ffi python API and types
#18276 - [FFI] Add ffi::Tensor.strides()
#18275 - [FFI][REFACTOR][ABI] Rename NDArray to Tensor
#18274 - [FFI] Update the interface of ffi.load_inline to match torch
#18273 - [FFI][ABI] Append symbol prefix for ffi exported functions
#18272 - [FFI] Construct NDArray.strides by default
#18271 - [FFI] Support inline module
#18270 - [FFI] Support Opaque PyObject
#18266 - [FFI] Update torch stream getter to use native torch c api
#18259 - [FFI][ABI] Introduce weak rc support
#18258 - [FFI] fix two seemingly migration issue
#18254 - [FFI][ABI] ABI Updates to for future metadata and complex ordering
#18249 - [FFI][CMAKE] Revert cmake libbacktrace URL and update submodule
#18246 - [FFI][CMAKE] Add missing download path for libbacktrace
#18234 - [FFI] Misc fixup for windows
#18233 - [FFI] Robustify the pyproject setup
#18226 - [FFI][REFACTOR] Establish tvm_ffi python module
#18221 - [FFI] Fix JSON parser/writer for the fast-math flag
#18218 - [FFI][REFACTOR] Cleanup API locations
#18217 - [FFI] AudoDLPack compatible with torch stream context
#18216 - [FFI][REFACTOR] Establish Stream Context in ffi
#18214 - [FFI][REFACTOR] Establish ffi.Module in python
#18213 - [FFI] Formalize ffi.Module
#18212 - [FFI] Make JSON Parser/Write fastmath safe
#18205 - [FFI][REFATOR] Cleanup entry function to redirect
#18200 - [FFI][REFACTOR] Update Map ABI to enable flexible smallMap switch
#18198 - [FFI][REFACTOR] Move Downcast out of ffi for now
#18192 - [FFI] Phase out ObjectPath in favor of AccessPath
#18191 - [FFI][REFACTOR] Refactor AccessPath to enable full tree repr
#18189 - [FFI][REFACTOR] Phase out getattr based attribute handling
#18188 - [FFI][REFACTOR] Migrate the Save/Load JSON to the new reflection
#18187 - [FFI][EXTRA] Serialization To/From JSONGraph
#18186 - [FFI] Lightweight json parser/writer
#18185 - [FFI] Introduce small string/bytes
#18184 - [FFI][REFACTOR] Hide StringObj/BytesObj into details
#18183 - [FFI][REFACTOR] Cleanup to align to latest ffi
#18172 - [REFACTOR][FFI] Phase out SEqualReduce/SHashReduce
#18172 - [REFACTOR][FFI] Phase out SEqualReduce/SHashReduce
#18178 - [FFI] Fix SmallMapInit with duplicated keys
#18177 - [FFI][REFACTOR] Isolate out extra API
#18176 - [FFI] Improve string equal/hash handling
#18166 - [FFI][REFACTOR] Migrate StructuralEqual/Hash to new reflection
#18165 - [FFI][REFACTOR] Enable custom s_hash/equal
#18160 - [FFI][REFACTOR] Introduce TypeAttr in reflection
#18156 - [FFI] Structural equal and hash based on reflectionx
#18149 - [FFI] Log and throw in function dup registration
#18148 - [FFI][REFACTOR] Phase out TVM_FFI_REGISTER_GLOBAL in favor of GlobalDef
#18147 - [FFI][REFACTOR] Modularize refelection
#18141 - [FFI][PYTHON] Improve the traceback generation in python

Relax

#18374 - [PyTorch] improve the check for no bias situation
#18358 - [Frontend][ONNX] Fix FastGelu when bias does not set
#18360 - [PyTorch] Support gru op for ExportedProgram importer
#18359 - [PyTorch] Fix the segfault in from_exported_program when model returns (Tensor, None) tuple
#18321 - [ONNX] Support AllClassNMS Operator for ONNX Frontend
#18346 - [PyTorch] Support lstm op for ExportedProgram importer
#18351 - [Frontend][Torch] Fix parsing error when input dimension of unbind is 1
#18331 - Update BasePyModule with faster DLPack converter for tensor conversion
#18343 - [PyTorch] Support MatrixMultiply op for ExportedProgram importer
#18336 - Operator and RoPE support for Llama4
#18329 - [Frontend][ONNX] Error converting operator Expand: TVMError: broadcast_to expects the input tensor shape is broadcastable to the target shape
#18326 - [Backend] Implement R.call_py_func operator for calling Python functions from compiled TVM
#18313 - Introduce R.call_py_func operator for calling Python functions from Relax IR
#18301 - Fix RelaxToPyFuncConverter compatibility and improve fallback handling
#18288 - Add symbolic shape support to BasePyModule for dynamic tensor operations
#18269 - Add Relax to Python Function Converter
#18253 - Building TVMScript printer for IRModules with Python functions
#18229 - Add Python function support and BasePyModule for PyTorch integration
#18242 - ONNX frontend using relax softplus operator
#18180 - [ONNX] Parse ONNX Upsample to Relax resize2d
#18179 - Support Relax Operator PReLU
#18163 - Fix issue in fuse concat ops by pattern
#18120 - [Fix]Fix potential out-of-bounds access in TupleRewriterNode
#18061 - [ONNX][Transform] Add mode choice, new mode, and warning for take()
#18122 - [KVCache] Fix kernel dispatch based on attention kinds

TIR

#18319 - Refactor division simplification in RewriteSimplifier
#18341 - Support sequence comparisons in TVMScript
#18323 - Add support for conditional expressions in TVMScript
#18199 - Fix host/device function check for build
#18154 - Fix trivial index map [] -> [0]
#18151 - Decouple DeepEqual from StructuralEqual
#18134 - Add T.thread_return() for early thread exit in CUDA kernels

TVMScript

#17804 - Support continue and break in tvmscript

cuda & cutlass & tensorrt

#18353 - [CUDA] Update FlashInfer JIT integration
#18320 - [TIR][CUDA] Preserve float precision in codegen with hexfloat output
#18300 - [CUDA] Support NVTX in CUDA 13
#18238 - [CUTLASS] Fix CUTLASS kernel compilation
#18144 - [CodeGen][CUDA] Add sinhf CUDA Math API for CodeGen

web

#18327 - [CMake]Install web/ directory in cmake for Python package
#18168 - Fix incompatible part after FFI updates

Misc

#18330 - [Analyzer] Enhance ConstIntBoundAnalyzer and IntervalSet with modular set analysis
#18372 - Upgrade to CUTLASS 4.2.1
#18348 - [Python] Add library lookup path for tvm installed as a pakcage
#18334 - Fix conflict parameter name promote_dtye in FP8ComputeLegalize
#18325 - [flashinfer] Support directing JIT to FlashInfer GroupedGemm kernels
#18328 - Fixing datatype error for gpt-2
#18318 - [3rdparty] Remove dlpack/libbacktrace from 3rdparty
#18317 - [FlashInfer] Update include path and interface
#18304 - Clear ext_lib_dll_names for macOS platform
#18299 - [Python] Fix runtime tensor import
#18252 - [Build] Complete TVM wheel building migration
#18236 - upgrade cutlass v4.2.0 supporting cuda 13
#18251 - [Python] Complete Python packaging with scikit-build-core
#18248 - [Python] Update version.py to bump pyproject.toml automatically
#18291 - [3rdparty] Bump cutlass_fpA_intB_gemm to fix SM90 build
#18239 - [Build] Migrate Python packaging to pyproject.toml with scikit-build-core
#18222 - [NVSHMEM] Fix compatibility with CUDA code without nvshmem use
#18220 - [Thrust] Fix getting CUDA stream
#18211 - [TARGET]add target for nvidia rtx 5060ti
#18206 - [CODEGEN][REFACTOR] tir.call_llvm_intrin to remove nargs
#18193 - Bump cutlass_fpA_intB_gemm to latest commit
#18197 - [REFACTOR] Update data type rewriter to enable recursive rewrite in Any
#18181 - [REFACTOR] Upgrade NestedMsg to use new ffi::Any mechanism
#18142 - [REFACTOR] Migrate TVM_FFI_REGISTER_GLOBAL to new reflection style
#18130 - Fix compilation warnings of unnecessary std::move() calls
#18129 - Delete redundant imports
#18055 - [Target] Support CUDA device function calls
#18127 - Revert "[Refactor] Build cython with isolate environment"
#18125 - Phase out StackVM runtime support
#18124 - [Refactor] Build cython with isolate environment
#18123 - [Codegen] Update LLVM version requirement for insertDeclare

apache/tvm v0.22.0.rc0 Apache TVM v0.22.0 on GitHub