Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), FFI etc.
Please visit the full listing of commits for a complete view: v0.22.dev0...v0.22.0.rc0.
Community
None.
RFCs
None.
BugFix
- #18352 - [Fix] Update ShapeView use in nccl.cc
- #18324 - Fixing binding for bert
- #18296 - [Fix] Add libxml2 dependency to fix Windows CI build failure
- #18294 - [Fix] Set DRefObj and CUDAIPCMemoryObj as mutable
- #18285 - [FFI]Enable
load_inlineon macos - #18287 - [Hotfix] Fix the conflicts about ffi-related updated names
- #18281 - [FFI]Fix bug of
ffi.cpp.load_inlineon Windows - #18262 - [NNAPI] Use kind() instead of type_key() after FFI refactor
- #18244 - [Fix] Update FlashInfer JIT header lookup
- #18237 - [FFI]Fix type_traits on DataType after SmallStr update
- #18232 - [LLVM][Fix] Do not emit debuginfo on vscale or other unknown types
- #18219 - [Fix] Resolve deadlock in PopenPoolExecutor and LocalBuilder
- #18207 - [Fix][ONNX] No precision widening for numpy binary operations
- #18209 - [ONNX][FRONTEND][Fix] Update Resize to accept ShapeExpr
- #18210 - [Bug] Fix core dump in InferLayoutRMSNorm and fix typo
- #18208 - [FFI][Fix] Update datatype registry calls to the new paths
- #18190 - [Fix] Codegen fix for relax cutlass
- #18170 - [Fix] Fix the wrong check for tuple node in #18163
- #18174 - [Misc]Fix missing PadAttrs register in op_attrs.py
- #18158 - Fix NCCL build with GlobalDef registration
- #18140 - [NNAPI] Fix type mismatch and test_mean annotation
- #18138 - [Fix][ONNX] Fixed constant ROI handling in resize2d when loading onnx models
- #18137 - [Fix][ONNX] Fix CumSum conversion when loading ONNX model
CI
- #18245 - [LLVM][MSWIN]Fix LLVM module build with latest CI update
- #18227 - Exit the build for AbortException
- #18145 - [Test] Use roi_list variable instead of hardcoded values in ROI tensor creation
Docs
- #18279 - [FFI]Initial bringup of cpp docs
- #18264 - Misc docs fix
- #18263 - [FFI]Initial docs scaffolding
- #18261 - [FFI]Add missing files in packaging example
- #18256 - [FFI]Wheel Packaging
- #18128 - [Doc] Visualize the architecture using a UML sequence diagram
Frontend
- #18143 - [ONNX] Extend axes for layer_norm when gamma/beta are multi-dimensional
LLVM
MetaSchedule
- #18243 - [LLVM]Add RISCV V-extension v1.0 kernels to metaschedule
Metal
ROCm
- #18225 - Minor fixes for latest refactor
FFI
- #18375 - [TE] [FFI] Fix broken axis/reduce_axis properties in BaseComputeOp and ScanOp after FFI refactoring
- #18376 - [FFI] Bump tvm-ffi to 0.1.0rc2
- #18370 - [FFI] Bump tvm-ffi dependency
- #18354 - [FFI][ABI] Bump tvm-ffi to latest
- #18349 - [FFI][ABI] Bump tvm-ffi to latest
- #18345 - [FFI][ABI] Bump tvm-ffi version to reflect RC ABI Update
- #18332 - [FFI][ABI] Bump version ffi to latest
- #18314 - [REFACTOR][FFI] Split tvm-ffi into a separate repo
- #18312 - [FFI][REFACTOR] Update TVM_FFI_STATIC_INIT_BLOCK to fn style
- #18311 - [FFI][ABI] Better String and Nested Container handling
- #18308 - [FFI][ABI] Refactor the naming of DLPack speed converter
- #18307 - [FFI] Update
load_inlineinterface - #18306 - [FFI][ABI][REFACTOR] Enhance DLPack Exchange Speed and Behavior
- #18302 - [FFI][REFACTOR] Refactor python ffi call mechanism for perf
- #18298 - [FFI] Fix system library symbol lookup
- #18297 - [FFI] Temp skip windows tests
- #18295 - [FFI][ABI] Introduce generic stream exchange protocol
- #18289 - [FFI][REFACTOR] Streamline Object Declare Macros
- #18284 - [FFI][REFACTOR] Introduce UnsafeInit and enhance ObjectRef null safety
- #18282 - [FFI] Relax default alignment and continguous requirement
- #18280 - [FFI][REFACTOR] Cleanup namespace
- #18278 - [FFI] Temp skip load_inline tests nonlinux
- #18277 - [FFI][REFACTOR] Cleanup tvm_ffi python API and types
- #18276 - [FFI] Add ffi::Tensor.strides()
- #18275 - [FFI][REFACTOR][ABI] Rename NDArray to Tensor
- #18274 - [FFI] Update the interface of
ffi.load_inlineto match torch - #18273 - [FFI][ABI] Append symbol prefix for ffi exported functions
- #18272 - [FFI] Construct NDArray.strides by default
- #18271 - [FFI] Support inline module
- #18270 - [FFI] Support Opaque PyObject
- #18266 - [FFI] Update torch stream getter to use native torch c api
- #18259 - [FFI][ABI] Introduce weak rc support
- #18258 - [FFI] fix two seemingly migration issue
- #18254 - [FFI][ABI] ABI Updates to for future metadata and complex ordering
- #18249 - [FFI][CMAKE] Revert cmake libbacktrace URL and update submodule
- #18246 - [FFI][CMAKE] Add missing download path for libbacktrace
- #18234 - [FFI] Misc fixup for windows
- #18233 - [FFI] Robustify the pyproject setup
- #18226 - [FFI][REFACTOR] Establish tvm_ffi python module
- #18221 - [FFI] Fix JSON parser/writer for the fast-math flag
- #18218 - [FFI][REFACTOR] Cleanup API locations
- #18217 - [FFI] AudoDLPack compatible with torch stream context
- #18216 - [FFI][REFACTOR] Establish Stream Context in ffi
- #18214 - [FFI][REFACTOR] Establish ffi.Module in python
- #18213 - [FFI] Formalize ffi.Module
- #18212 - [FFI] Make JSON Parser/Write fastmath safe
- #18205 - [FFI][REFATOR] Cleanup entry function to redirect
- #18200 - [FFI][REFACTOR] Update Map ABI to enable flexible smallMap switch
- #18198 - [FFI][REFACTOR] Move Downcast out of ffi for now
- #18192 - [FFI] Phase out ObjectPath in favor of AccessPath
- #18191 - [FFI][REFACTOR] Refactor AccessPath to enable full tree repr
- #18189 - [FFI][REFACTOR] Phase out getattr based attribute handling
- #18188 - [FFI][REFACTOR] Migrate the Save/Load JSON to the new reflection
- #18187 - [FFI][EXTRA] Serialization To/From JSONGraph
- #18186 - [FFI] Lightweight json parser/writer
- #18185 - [FFI] Introduce small string/bytes
- #18184 - [FFI][REFACTOR] Hide StringObj/BytesObj into details
- #18183 - [FFI][REFACTOR] Cleanup to align to latest ffi
- #18172 - [REFACTOR][FFI] Phase out SEqualReduce/SHashReduce
- #18172 - [REFACTOR][FFI] Phase out SEqualReduce/SHashReduce
- #18178 - [FFI] Fix SmallMapInit with duplicated keys
- #18177 - [FFI][REFACTOR] Isolate out extra API
- #18176 - [FFI] Improve string equal/hash handling
- #18166 - [FFI][REFACTOR] Migrate StructuralEqual/Hash to new reflection
- #18165 - [FFI][REFACTOR] Enable custom s_hash/equal
- #18160 - [FFI][REFACTOR] Introduce TypeAttr in reflection
- #18156 - [FFI] Structural equal and hash based on reflectionx
- #18149 - [FFI] Log and throw in function dup registration
- #18148 - [FFI][REFACTOR] Phase out TVM_FFI_REGISTER_GLOBAL in favor of GlobalDef
- #18147 - [FFI][REFACTOR] Modularize refelection
- #18141 - [FFI][PYTHON] Improve the traceback generation in python
Relax
- #18374 - [PyTorch] improve the check for no bias situation
- #18358 - [Frontend][ONNX] Fix
FastGeluwhen bias does not set - #18360 - [PyTorch] Support gru op for ExportedProgram importer
- #18359 - [PyTorch] Fix the segfault in from_exported_program when model returns (Tensor, None) tuple
- #18321 - [ONNX] Support AllClassNMS Operator for ONNX Frontend
- #18346 - [PyTorch] Support lstm op for ExportedProgram importer
- #18351 - [Frontend][Torch] Fix parsing error when input dimension of unbind is 1
- #18331 - Update BasePyModule with faster DLPack converter for tensor conversion
- #18343 - [PyTorch] Support MatrixMultiply op for ExportedProgram importer
- #18336 - Operator and RoPE support for Llama4
- #18329 - [Frontend][ONNX] Error converting operator Expand: TVMError: broadcast_to expects the input tensor shape is broadcastable to the target shape
- #18326 - [Backend] Implement R.call_py_func operator for calling Python functions from compiled TVM
- #18313 - Introduce R.call_py_func operator for calling Python functions from Relax IR
- #18301 - Fix RelaxToPyFuncConverter compatibility and improve fallback handling
- #18288 - Add symbolic shape support to BasePyModule for dynamic tensor operations
- #18269 - Add Relax to Python Function Converter
- #18253 - Building TVMScript printer for IRModules with Python functions
- #18229 - Add Python function support and BasePyModule for PyTorch integration
- #18242 - ONNX frontend using relax softplus operator
- #18180 - [ONNX] Parse ONNX Upsample to Relax resize2d
- #18179 - Support Relax Operator PReLU
- #18163 - Fix issue in fuse concat ops by pattern
- #18120 - [Fix]Fix potential out-of-bounds access in
TupleRewriterNode - #18061 - [ONNX][Transform] Add mode choice, new mode, and warning for take()
- #18122 - [KVCache] Fix kernel dispatch based on attention kinds
TIR
- #18319 - Refactor division simplification in RewriteSimplifier
- #18341 - Support sequence comparisons in TVMScript
- #18323 - Add support for conditional expressions in TVMScript
- #18199 - Fix host/device function check for build
- #18154 - Fix trivial index map [] -> [0]
- #18151 - Decouple DeepEqual from StructuralEqual
- #18134 - Add
T.thread_return()for early thread exit in CUDA kernels
TVMScript
- #17804 - Support continue and break in tvmscript
cuda & cutlass & tensorrt
- #18353 - [CUDA] Update FlashInfer JIT integration
- #18320 - [TIR][CUDA] Preserve float precision in codegen with hexfloat output
- #18300 - [CUDA] Support NVTX in CUDA 13
- #18238 - [CUTLASS] Fix CUTLASS kernel compilation
- #18144 - [CodeGen][CUDA] Add sinhf CUDA Math API for CodeGen
web
- #18327 - [CMake]Install
web/directory in cmake for Python package - #18168 - Fix incompatible part after FFI updates
Misc
- #18330 - [Analyzer] Enhance ConstIntBoundAnalyzer and IntervalSet with modular set analysis
- #18372 - Upgrade to CUTLASS 4.2.1
- #18348 - [Python] Add library lookup path for tvm installed as a pakcage
- #18334 - Fix conflict parameter name promote_dtye in FP8ComputeLegalize
- #18325 - [flashinfer] Support directing JIT to FlashInfer GroupedGemm kernels
- #18328 - Fixing datatype error for gpt-2
- #18318 - [3rdparty] Remove dlpack/libbacktrace from 3rdparty
- #18317 - [FlashInfer] Update include path and interface
- #18304 - Clear ext_lib_dll_names for macOS platform
- #18299 - [Python] Fix runtime tensor import
- #18252 - [Build] Complete TVM wheel building migration
- #18236 - upgrade cutlass v4.2.0 supporting cuda 13
- #18251 - [Python] Complete Python packaging with scikit-build-core
- #18248 - [Python] Update version.py to bump pyproject.toml automatically
- #18291 - [3rdparty] Bump cutlass_fpA_intB_gemm to fix SM90 build
- #18239 - [Build] Migrate Python packaging to pyproject.toml with scikit-build-core
- #18222 - [NVSHMEM] Fix compatibility with CUDA code without nvshmem use
- #18220 - [Thrust] Fix getting CUDA stream
- #18211 - [TARGET]add target for nvidia rtx 5060ti
- #18206 - [CODEGEN][REFACTOR] tir.call_llvm_intrin to remove nargs
- #18193 - Bump cutlass_fpA_intB_gemm to latest commit
- #18197 - [REFACTOR] Update data type rewriter to enable recursive rewrite in Any
- #18181 - [REFACTOR] Upgrade NestedMsg to use new ffi::Any mechanism
- #18142 - [REFACTOR] Migrate TVM_FFI_REGISTER_GLOBAL to new reflection style
- #18130 - Fix compilation warnings of unnecessary
std::move()calls - #18129 - Delete redundant imports
- #18055 - [Target] Support CUDA device function calls
- #18127 - Revert "[Refactor] Build cython with isolate environment"
- #18125 - Phase out StackVM runtime support
- #18124 - [Refactor] Build cython with isolate environment
- #18123 - [Codegen] Update LLVM version requirement for
insertDeclare