Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), FFI etc.
Please visit the full listing of commits for a complete view: v0.21.dev0...v0.21.0.rc0.
Community
None.
RFCs
None.
Arith
- #18067 - Add IsBound method to ConstIntBoundAnalyzer
- #18031 - Canonicalize mul-coefficient to rhs
- #18025 - Fix canonical simplify for LE with incorrect range assumptions
BugFix
- #18115 - [Fix][Serialization] Add support for NaN value serialization
- #18103 - [Fix] Replace dmlc::Error with std::exception in VerifyGPUCode
- #18092 - [Fix] Fix ExecBuilderDeclareFunction method name in exec_builder.py
- #18087 - fix exception when tvm not built with llvm support
- #18035 - [CUDA] Fix: Update settings for rerun on Increase FloatImm precision when printing 64 bit values in CUDA codegen
- #17968 - [Relax][Pytorch] Bugfix of conv_transpose1d and conv_transpose2d
- #17950 - [Fix][Relax] Fix dangling reference in GetTargetFunctions()
- #17902 - Fix off-by-one error in the type index range check within Object::IsInstance()
- #17882 - [Relax][Pytorch] Fix incorrect behaviour of % (mod) operator in TVM frontend
- #17875 - [Relax][Pytorch] Incorrect Handling of In-Place Ops in FX-Based TVM Frontend
- #17838 - [TIR] Schedule support reverse-inline with reduction blocks
CI
- #18071 - Update windows to 2025
- #18058 - [TEST] Move temp files into tempdir
- #18037 - Further robustify is_last_build check
- #17981 - Update images to
20250513-063354-70aa3797
- #17891 - Update images to 20250428-080833-03eadc65
- #17905 - Install PyTorch 2.7 compatible with CUDA 11.8
- #17887 - Upgrade pytorch to 2.7.0, torchvision to 0.22.0, and vulkan sdk to 1.4.309
- #17846 - Upgrade ubuntu runner image for GitHub CI
Docker
- #17955 - [CI] Reintroduce NNEF to CI images
Docs
- #18056 - Update installation instruction based ffi refactor
Frontend
- #18090 - [Relax][ONNX] Update Reduce ops to support axes as input
- #18072 - [Relax][ONNX] Update ReduceL1 to opset 18
- #18016 - [Relax][ONNX] Replace deprecated
mapping.TENSOR_TYPE_TO_NP_TYPE
usage - #18001 - [Relax][ONNX] Fix: bitwise_not misclassified as binary (is …
- #17990 - [Relax]Fix: Output tensor with zero dimension after torch.u…
- #17925 - [Relax][PyTorch] Re-enable test_subgraph_capture in dynamo test
- #17980 - [ONNX] Make bias input optional in LayerNormalization
- #17918 - [Relax][PyTorch] Add ReLU6 Op Support for Exported Program and FX graph
- #17930 - [Relax][PyTorch] Add torch.outer Op Support for Exported Program and FX graph
- #17932 - [Relax][PyTorch] Add UpSample Bicubic Op Support for Exported Program and FX graph
- #17921 - [Relax][PyTorch] Add AvgPool 1D and 3D Op Support for Exported Program and FX graph
- #17922 - [Relax][PyTorch] Add Adaptive AvgPool 1D and 3D Op Support for Exported Program and FX graph
- #17863 - [Relax][PyTorch] CrossEntropyLoss
- #17919 - [Relax][PyTorch] Add MaxPool 1D and 3D Op Support for Exported Program and FX graph
- #17926 - [Relax][PyTorch] Add tests for all the dtypes supported in the PyTorch frontend
- #17924 - [Relax][PyTorch] Add div.Tensor_mode and trunc Op Support for Exported Program and FX graph
- #17904 - [Relax][PyTorch] Add Meshgrid Op Support for Exported Program and FX graph
- #17915 - [Relax][PyTorch] Add support for linspace op in fx graph
- #17886 - [Relax][PyTorch] Add Pixel Shuffle Op Support for Exported Program and FX graph
- #17908 - [Relax][PyTorch] Add support for eye op in fx graph
- #17893 - [Relax][Pytorch] Add fmod support
- #17894 - [Relax][PyTorch] Support torch.bfloat16 dtype in pytorch frontend
- #17878 - [Relax][PyTorch] Add torch.isin Op Support for Exported Program and FX graph
- #17889 - [Relax][PyTorch] Support linspace op for ExportedProgram importer
- #17868 - [Relax][Pytorch] Add support for ones_like, zero_, zeros, type_as, item ops
- #17857 - [Relax][PyTorch] Refactor norm op for ExportedProgram importer
- #17852 - [Relax][PyTorch] Sort.default
- #17871 - [Relax][Pytorch] Add support for bitwise_or op support
- #17836 - [Relax][PyTorch] support for index.Tensor
- #17864 - [Relax][PyTorch] Support eye op for ExportedProgram importer
- #17858 - [Relax][PyTorch] Add copy_ op support in fxGraph
- #17851 - [Relax][PyTorch] Support
leaky_relu_.default
andreshape_as.default
in ExportedProgram frontend - #17843 - [Relax][PyTorch] Add mul_.Tensor, max.default, min.default and pow.Scalar Op Support into Exported Program Frontend
- #17821 - [Relax][PyTorch] Add Pad Op Support for Exported Program and FX graph
- #17819 - [Relax][PyTorch] Add Stack Op Support for Exported Program
- #17849 - [Relax][PyTorch] Add RSub Op Support for Exported Program and FX graph
- #17850 - [Relax][Pytorch] Add masked_fill op support in ExportedProgram
- #17816 - [Relax][PyTorch] Add PReLU Op Support for Exported Program and FX graph
- #17803 - [Relax][PyTorch] Add Logaddexp op support for exported program
- #17841 - [Relax][PyTorch] Add support for norm op
- #17832 - [Relax][PyTorch] full.default, full_like.default, ones.default
- #17830 - [Relax][PyTorch] Support narrow and broadcast_to ops for ExportedProgram importer
LLVM
- #17859 - [Codegen] Enable SVE/VLA for RISCV targets
- #17958 - Fix JIT unknown reloc issue for case of RISCV
- #17954 - [FFI]Fix compilation errors with clang20
Metal
- #18034 - Fix
GetFunction
of metal runtime
ROCm
- #18029 - Fix ROCm build after FFI refactor
Relax
- #18102 - Fix rotary embedding buffer size calculation
- #17928 - [KVCache] Per Layer Sliding Window
- #17840 - Refactor missing op check into shared utility for Torch frontends
- #17826 - Fix Torch frontends to report all the missing ops
Runtime
- #18097 - CutensorMap support
TIR
- #18068 - Extend address_of to support Buffer objects
- #18069 - Fix block access region detection for nested let bindings
- #18057 - Phase out ProducerStore, ProducerRealize and Prefetch
TOPI
- #18039 - [Relax] Support InstanceNorm & Bugfix of InstanceNorm
- #18063 - [NN][Layer_Norm] Fix layer_norm error with reduce-only axes
- #18006 - Fix index handling in expand_like operator for axis expansion
- #18015 - Support integer type input for log10
- #17942 - Add shape validation to prevent negative dimensions in conv operations
Vulkan
- #18005 - Add TIR unary trigonometric/hyperbolic intrinsic definitions
cuda & cutlass & tensorrt
- #18064 - [CUTLASS] Fix CUTLASS kernel build on Hopper
- #18033 - [CUTLASS] Add GeMM kernels for Blackwell GPUs
- #18024 - [CUDA] Fix thrust with latest FFI refactor
- #18118 - bump cutlass_fpA_intB_gemm
- #18113 - [CMake] Refine C++/CUDA standard settings in CMakeLists.txt
FFI
- #18076 - [FFI][REFACTOR] Stablize container ABI and implementation
- #18091 - [FFI] Provide Field Visit bridge so we can do gradual transition
- #18095 - [FFI][REFACTOR] Migrate attrs to use new reflection
- #18083 - [FFI] Update typeinfo to speedup parent reflection
- #18077 - [FFI] Optimize atomic decref in Object
- #18065 - [FFI] Introduce FFI reflection support in python
- #18062 - [FFI][REFACTOR] Update registry to have complete meta-data
- #18059 - [FFI][REFACTOR] Enhance reflection
- #18050 - [FFI] Enhance FFI Object exception safety during init
- #18121 - Revert "[FFI] Replace
Arg2Str
with a more powerfulfor_each
" - #18117 - [FFI] Replace
Arg2Str
with a more powerfulfor_each
- #18116 - [FFI] Use fold expression to simplify for_each
- #18114 - [FFI] Replace
__attribute__
with C++ standard attributes - #18112 - [FFI] Cleanup visit_attrs attribute after refactor
- #18111 - [FFI] Introduce GlobalDef for function registration
- #18106 - [REFACTOR][FFI] Phase out old VisitAttrs mechanism
- #18042 - [REFACTOR][FFI] Update symbol name for library module
- #18023 - [FFI] More strict tuple constructor checking
- #18022 - [REFACTOR][FFI] Cleanup PackedFunc redirections
- #18020 - [REFACTOR][PYTHON] Phase out tvm._ffi and Limited API support
- #17979 - [FFI][REFACTOR] Update to distinguish as and cast
- #17983 - [FFI][JVM] Upgrade tvm4j to latest FFI
- #18010 - [REFACTOR][FFI] Phase out legacy C API
- #17943 - [FFI] Variant specialize for all ObjectRef
- #17939 - [REFACTOR] Phase out legacy rust ffi
- #17940 - [REFACTOR] Phase out legacy go ffi
- #17931 - [REFACTOR][FFI][RPC] Migrate RPC to use the latest FFI ABI
- #17929 - [REFACTOR][FFI] Cleanup container redirections
- #17927 - [FFI][FEAT] AutoDLPack for taking external tensor objects
- #17923 - [REFACTOR][FFI] Cleanup PackedFunc related redirection
- #17920 - [REFACTOR] Introduce and modernize ffi system
web
- #17946 - [REFACTOR][FFI]Upgrade Web Runtime to new FFI
- #17917 - [WebGPU][CodeGen] Override PrintVecElemLoad and Store for WebGPU
Misc
- #18104 - Add LLVM Legalization for tir.erf
- #18107 - fix: guard tensormap with cuda version check
- #18101 - [REFACTOR] Formalize namespace for all objects
- #18040 - Add support for bucketize
- #18098 - [REFACTOR] Transition VisitAttrs to new reflection mechanism
- #18096 - [REFACTOR] Transition VisitAttrs to new reflection mechanism in tir/ir_builder/meta_schedule
- #18093 - [NVSHMEM] Extend CUDA backend to compile and link TIR modules with NVSHMEM
- #18088 - [Script] Enhance alloc buffer handling in nested frames
- #18086 - [SCRIPT] Bump Python minimum version to 3.9 and update AST compatibility
- #18075 - add support for softsign op
- #18079 - [Script] Add support for merging block annotations
- #18080 - [REFACTOR] Phase out LegacyReprPrinter and improve CommonSubExprElim
- #18078 - [REFACTOR] Phase out the RelaxExpr.checked_type in favor of struct_info
- #18073 - [NVSHMEM] Update NDArray allocation
- #18066 - [Script] Remove deprecated attributes from Constant AST node
- #18060 - Add Python functor support for TIR expressions and statements
- #18054 - [Pytest] Remove obsolete test suite entries
- #18036 - Add support for hamming_window op
- #18049 - [Refactor] Rename
relax_vm
tovm
- #18046 - [3rdparty] Phasing out FlashInfer AOT from 3rdparty
- #18047 - [Backend] JIT compile FlashInfer kernel with FFI header
- #18041 - [DTYPE] Fix dtype functions after dtype refactor
- #18043 - [REFACTOR] Phase out the relax tuning_api
- #18038 - Resolving inconsistency between attention/attention_bias
- #18027 - [Dtype] Low-precision Blackwell Datatype Support
- #17985 - [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends
- #17978 - Fix IR generation conflict in topi.nn.simplify by separating Tensor and PrimExpr handling
- #18026 - [Python] Fix library lookup path for pip installed packages
- #18019 - Add op support for slice_scatter
- #17974 - Fix FLOP estimation for EvaluateNode by implementing VisitStmt_ handler
- #18013 - Fix RuntimeError: parallel_for_dynamic
- #18014 - Fix division truncation in window size calculation for small dtypes in average_pool
- #17995 - Fix zero-extent loops in PerStoreFeature to prevent crashes
- #17969 - Add registion for the operator asinh, acosh, atanh in llvm
- #17972 - Fix g.costs
- #17953 - Fix sqrt/rsqrt Compatibility with Integer Data Types
- #17961 - Fix basic FLOP estimation for WhileNode
- #17945 - Add registion for the operator asin and acos in llvm
- #17951 - [NODE] Fix structural equality for Array specialization
- #17913 - [Triton] Support latest
triton.compile
interface - #17911 - Add op support for new_zeros op in Exported Program and fx graph frontend
- #17909 - Add masked_fill_.scalar, logical_not.default in Exported Program frontend
- #17910 - [RPC] Fix Bug That Change Dict When Iterate The Keys
- #17896 - Add op support for zeros_like and fill_
- #17900 - Fix onnx expand op
- #17865 - Add support for index_put_ op
- #17839 - Add op support for roll op
- #17844 - Fix incorrect docstring in topi softmax
- #17831 - [3rdparty] Bump DLPack to v1.1 for float8/6/4 dtype supports
- #17848 - Fix docstring in batch_to_space_nd and bitpack
- #17845 - fixing incorrect docstring in upsampling.py
- #17808 - [Install] Fix error during python/tvm installation