Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), FFI etc.

Please visit the full listing of commits for a complete view: v0.21.dev0...v0.21.0.rc0.

Community

None.

RFCs

None.

Arith

#18067 - Add IsBound method to ConstIntBoundAnalyzer
#18031 - Canonicalize mul-coefficient to rhs
#18025 - Fix canonical simplify for LE with incorrect range assumptions

BugFix

#18115 - [Fix][Serialization] Add support for NaN value serialization
#18103 - [Fix] Replace dmlc::Error with std::exception in VerifyGPUCode
#18092 - [Fix] Fix ExecBuilderDeclareFunction method name in exec_builder.py
#18087 - fix exception when tvm not built with llvm support
#18035 - [CUDA] Fix: Update settings for rerun on Increase FloatImm precision when printing 64 bit values in CUDA codegen
#17968 - [Relax][Pytorch] Bugfix of conv_transpose1d and conv_transpose2d
#17950 - [Fix][Relax] Fix dangling reference in GetTargetFunctions()
#17902 - Fix off-by-one error in the type index range check within Object::IsInstance()
#17882 - [Relax][Pytorch] Fix incorrect behaviour of % (mod) operator in TVM frontend
#17875 - [Relax][Pytorch] Incorrect Handling of In-Place Ops in FX-Based TVM Frontend
#17838 - [TIR] Schedule support reverse-inline with reduction blocks

CI

#18071 - Update windows to 2025
#18058 - [TEST] Move temp files into tempdir
#18037 - Further robustify is_last_build check
#17981 - Update images to 20250513-063354-70aa3797
#17891 - Update images to 20250428-080833-03eadc65
#17905 - Install PyTorch 2.7 compatible with CUDA 11.8
#17887 - Upgrade pytorch to 2.7.0, torchvision to 0.22.0, and vulkan sdk to 1.4.309
#17846 - Upgrade ubuntu runner image for GitHub CI

Docker

#17955 - [CI] Reintroduce NNEF to CI images

Docs

#18056 - Update installation instruction based ffi refactor

Frontend

#18090 - [Relax][ONNX] Update Reduce ops to support axes as input
#18072 - [Relax][ONNX] Update ReduceL1 to opset 18
#18016 - [Relax][ONNX] Replace deprecated mapping.TENSOR_TYPE_TO_NP_TYPE usage
#18001 - [Relax][ONNX] Fix: bitwise_not misclassified as binary (is …
#17990 - [Relax]Fix: Output tensor with zero dimension after torch.u…
#17925 - [Relax][PyTorch] Re-enable test_subgraph_capture in dynamo test
#17980 - [ONNX] Make bias input optional in LayerNormalization
#17918 - [Relax][PyTorch] Add ReLU6 Op Support for Exported Program and FX graph
#17930 - [Relax][PyTorch] Add torch.outer Op Support for Exported Program and FX graph
#17932 - [Relax][PyTorch] Add UpSample Bicubic Op Support for Exported Program and FX graph
#17921 - [Relax][PyTorch] Add AvgPool 1D and 3D Op Support for Exported Program and FX graph
#17922 - [Relax][PyTorch] Add Adaptive AvgPool 1D and 3D Op Support for Exported Program and FX graph
#17863 - [Relax][PyTorch] CrossEntropyLoss
#17919 - [Relax][PyTorch] Add MaxPool 1D and 3D Op Support for Exported Program and FX graph
#17926 - [Relax][PyTorch] Add tests for all the dtypes supported in the PyTorch frontend
#17924 - [Relax][PyTorch] Add div.Tensor_mode and trunc Op Support for Exported Program and FX graph
#17904 - [Relax][PyTorch] Add Meshgrid Op Support for Exported Program and FX graph
#17915 - [Relax][PyTorch] Add support for linspace op in fx graph
#17886 - [Relax][PyTorch] Add Pixel Shuffle Op Support for Exported Program and FX graph
#17908 - [Relax][PyTorch] Add support for eye op in fx graph
#17893 - [Relax][Pytorch] Add fmod support
#17894 - [Relax][PyTorch] Support torch.bfloat16 dtype in pytorch frontend
#17878 - [Relax][PyTorch] Add torch.isin Op Support for Exported Program and FX graph
#17889 - [Relax][PyTorch] Support linspace op for ExportedProgram importer
#17868 - [Relax][Pytorch] Add support for ones_like, zero_, zeros, type_as, item ops
#17857 - [Relax][PyTorch] Refactor norm op for ExportedProgram importer
#17852 - [Relax][PyTorch] Sort.default
#17871 - [Relax][Pytorch] Add support for bitwise_or op support
#17836 - [Relax][PyTorch] support for index.Tensor
#17864 - [Relax][PyTorch] Support eye op for ExportedProgram importer
#17858 - [Relax][PyTorch] Add copy_ op support in fxGraph
#17851 - [Relax][PyTorch] Support leaky_relu_.default and reshape_as.default in ExportedProgram frontend
#17843 - [Relax][PyTorch] Add mul_.Tensor, max.default, min.default and pow.Scalar Op Support into Exported Program Frontend
#17821 - [Relax][PyTorch] Add Pad Op Support for Exported Program and FX graph
#17819 - [Relax][PyTorch] Add Stack Op Support for Exported Program
#17849 - [Relax][PyTorch] Add RSub Op Support for Exported Program and FX graph
#17850 - [Relax][Pytorch] Add masked_fill op support in ExportedProgram
#17816 - [Relax][PyTorch] Add PReLU Op Support for Exported Program and FX graph
#17803 - [Relax][PyTorch] Add Logaddexp op support for exported program
#17841 - [Relax][PyTorch] Add support for norm op
#17832 - [Relax][PyTorch] full.default, full_like.default, ones.default
#17830 - [Relax][PyTorch] Support narrow and broadcast_to ops for ExportedProgram importer

LLVM

#17859 - [Codegen] Enable SVE/VLA for RISCV targets
#17958 - Fix JIT unknown reloc issue for case of RISCV
#17954 - [FFI]Fix compilation errors with clang20

Metal

#18034 - Fix GetFunction of metal runtime

ROCm

#18029 - Fix ROCm build after FFI refactor

Relax

#18102 - Fix rotary embedding buffer size calculation
#17928 - [KVCache] Per Layer Sliding Window
#17840 - Refactor missing op check into shared utility for Torch frontends
#17826 - Fix Torch frontends to report all the missing ops

Runtime

#18097 - CutensorMap support

TIR

#18068 - Extend address_of to support Buffer objects
#18069 - Fix block access region detection for nested let bindings
#18057 - Phase out ProducerStore, ProducerRealize and Prefetch

TOPI

#18039 - [Relax] Support InstanceNorm & Bugfix of InstanceNorm
#18063 - [NN][Layer_Norm] Fix layer_norm error with reduce-only axes
#18006 - Fix index handling in expand_like operator for axis expansion
#18015 - Support integer type input for log10
#17942 - Add shape validation to prevent negative dimensions in conv operations

Vulkan

#18005 - Add TIR unary trigonometric/hyperbolic intrinsic definitions

cuda & cutlass & tensorrt

#18064 - [CUTLASS] Fix CUTLASS kernel build on Hopper
#18033 - [CUTLASS] Add GeMM kernels for Blackwell GPUs
#18024 - [CUDA] Fix thrust with latest FFI refactor
#18118 - bump cutlass_fpA_intB_gemm
#18113 - [CMake] Refine C++/CUDA standard settings in CMakeLists.txt

FFI

#18076 - [FFI][REFACTOR] Stablize container ABI and implementation
#18091 - [FFI] Provide Field Visit bridge so we can do gradual transition
#18095 - [FFI][REFACTOR] Migrate attrs to use new reflection
#18083 - [FFI] Update typeinfo to speedup parent reflection
#18077 - [FFI] Optimize atomic decref in Object
#18065 - [FFI] Introduce FFI reflection support in python
#18062 - [FFI][REFACTOR] Update registry to have complete meta-data
#18059 - [FFI][REFACTOR] Enhance reflection
#18050 - [FFI] Enhance FFI Object exception safety during init
#18121 - Revert "[FFI] Replace Arg2Str with a more powerful for_each"
#18117 - [FFI] Replace Arg2Str with a more powerful for_each
#18116 - [FFI] Use fold expression to simplify for_each
#18114 - [FFI] Replace __attribute__ with C++ standard attributes
#18112 - [FFI] Cleanup visit_attrs attribute after refactor
#18111 - [FFI] Introduce GlobalDef for function registration
#18106 - [REFACTOR][FFI] Phase out old VisitAttrs mechanism
#18042 - [REFACTOR][FFI] Update symbol name for library module
#18023 - [FFI] More strict tuple constructor checking
#18022 - [REFACTOR][FFI] Cleanup PackedFunc redirections
#18020 - [REFACTOR][PYTHON] Phase out tvm._ffi and Limited API support
#17979 - [FFI][REFACTOR] Update to distinguish as and cast
#17983 - [FFI][JVM] Upgrade tvm4j to latest FFI
#18010 - [REFACTOR][FFI] Phase out legacy C API
#17943 - [FFI] Variant specialize for all ObjectRef
#17939 - [REFACTOR] Phase out legacy rust ffi
#17940 - [REFACTOR] Phase out legacy go ffi
#17931 - [REFACTOR][FFI][RPC] Migrate RPC to use the latest FFI ABI
#17929 - [REFACTOR][FFI] Cleanup container redirections
#17927 - [FFI][FEAT] AutoDLPack for taking external tensor objects
#17923 - [REFACTOR][FFI] Cleanup PackedFunc related redirection
#17920 - [REFACTOR] Introduce and modernize ffi system

web

#17946 - [REFACTOR][FFI]Upgrade Web Runtime to new FFI
#17917 - [WebGPU][CodeGen] Override PrintVecElemLoad and Store for WebGPU

Misc

#18104 - Add LLVM Legalization for tir.erf
#18107 - fix: guard tensormap with cuda version check
#18101 - [REFACTOR] Formalize namespace for all objects
#18040 - Add support for bucketize
#18098 - [REFACTOR] Transition VisitAttrs to new reflection mechanism
#18096 - [REFACTOR] Transition VisitAttrs to new reflection mechanism in tir/ir_builder/meta_schedule
#18093 - [NVSHMEM] Extend CUDA backend to compile and link TIR modules with NVSHMEM
#18088 - [Script] Enhance alloc buffer handling in nested frames
#18086 - [SCRIPT] Bump Python minimum version to 3.9 and update AST compatibility
#18075 - add support for softsign op
#18079 - [Script] Add support for merging block annotations
#18080 - [REFACTOR] Phase out LegacyReprPrinter and improve CommonSubExprElim
#18078 - [REFACTOR] Phase out the RelaxExpr.checked_type in favor of struct_info
#18073 - [NVSHMEM] Update NDArray allocation
#18066 - [Script] Remove deprecated attributes from Constant AST node
#18060 - Add Python functor support for TIR expressions and statements
#18054 - [Pytest] Remove obsolete test suite entries
#18036 - Add support for hamming_window op
#18049 - [Refactor] Rename relax_vm to vm
#18046 - [3rdparty] Phasing out FlashInfer AOT from 3rdparty
#18047 - [Backend] JIT compile FlashInfer kernel with FFI header
#18041 - [DTYPE] Fix dtype functions after dtype refactor
#18043 - [REFACTOR] Phase out the relax tuning_api
#18038 - Resolving inconsistency between attention/attention_bias
#18027 - [Dtype] Low-precision Blackwell Datatype Support
#17985 - [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends
#17978 - Fix IR generation conflict in topi.nn.simplify by separating Tensor and PrimExpr handling
#18026 - [Python] Fix library lookup path for pip installed packages
#18019 - Add op support for slice_scatter
#17974 - Fix FLOP estimation for EvaluateNode by implementing VisitStmt_ handler
#18013 - Fix RuntimeError: parallel_for_dynamic
#18014 - Fix division truncation in window size calculation for small dtypes in average_pool
#17995 - Fix zero-extent loops in PerStoreFeature to prevent crashes
#17969 - Add registion for the operator asinh, acosh, atanh in llvm
#17972 - Fix g.costs
#17953 - Fix sqrt/rsqrt Compatibility with Integer Data Types
#17961 - Fix basic FLOP estimation for WhileNode
#17945 - Add registion for the operator asin and acos in llvm
#17951 - [NODE] Fix structural equality for Array specialization
#17913 - [Triton] Support latest triton.compile interface
#17911 - Add op support for new_zeros op in Exported Program and fx graph frontend
#17909 - Add masked_fill_.scalar, logical_not.default in Exported Program frontend
#17910 - [RPC] Fix Bug That Change Dict When Iterate The Keys
#17896 - Add op support for zeros_like and fill_
#17900 - Fix onnx expand op
#17865 - Add support for index_put_ op
#17839 - Add op support for roll op
#17844 - Fix incorrect docstring in topi softmax
#17831 - [3rdparty] Bump DLPack to v1.1 for float8/6/4 dtype supports
#17848 - Fix docstring in batch_to_space_nd and bitpack
#17845 - fixing incorrect docstring in upsampling.py
#17808 - [Install] Fix error during python/tvm installation

apache/tvm v0.21.0.rc0 Apache TVM v0.21.0 on GitHub