github apache/tvm v0.21.0
Apache TVM v0.21.0

one month ago

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), FFI etc.

Please visit the full listing of commits for a complete view: v0.21.dev0...v0.21.0.rc0.

Community

None.

RFCs

None.

Arith

  • #18067 - Add IsBound method to ConstIntBoundAnalyzer
  • #18031 - Canonicalize mul-coefficient to rhs
  • #18025 - Fix canonical simplify for LE with incorrect range assumptions

BugFix

  • #18115 - [Fix][Serialization] Add support for NaN value serialization
  • #18103 - [Fix] Replace dmlc::Error with std::exception in VerifyGPUCode
  • #18092 - [Fix] Fix ExecBuilderDeclareFunction method name in exec_builder.py
  • #18087 - fix exception when tvm not built with llvm support
  • #18035 - [CUDA] Fix: Update settings for rerun on Increase FloatImm precision when printing 64 bit values in CUDA codegen
  • #17968 - [Relax][Pytorch] Bugfix of conv_transpose1d and conv_transpose2d
  • #17950 - [Fix][Relax] Fix dangling reference in GetTargetFunctions()
  • #17902 - Fix off-by-one error in the type index range check within Object::IsInstance()
  • #17882 - [Relax][Pytorch] Fix incorrect behaviour of % (mod) operator in TVM frontend
  • #17875 - [Relax][Pytorch] Incorrect Handling of In-Place Ops in FX-Based TVM Frontend
  • #17838 - [TIR] Schedule support reverse-inline with reduction blocks

CI

  • #18071 - Update windows to 2025
  • #18058 - [TEST] Move temp files into tempdir
  • #18037 - Further robustify is_last_build check
  • #17981 - Update images to 20250513-063354-70aa3797
  • #17891 - Update images to 20250428-080833-03eadc65
  • #17905 - Install PyTorch 2.7 compatible with CUDA 11.8
  • #17887 - Upgrade pytorch to 2.7.0, torchvision to 0.22.0, and vulkan sdk to 1.4.309
  • #17846 - Upgrade ubuntu runner image for GitHub CI

Docker

  • #17955 - [CI] Reintroduce NNEF to CI images

Docs

  • #18056 - Update installation instruction based ffi refactor

Frontend

  • #18090 - [Relax][ONNX] Update Reduce ops to support axes as input
  • #18072 - [Relax][ONNX] Update ReduceL1 to opset 18
  • #18016 - [Relax][ONNX] Replace deprecated mapping.TENSOR_TYPE_TO_NP_TYPE usage
  • #18001 - [Relax][ONNX] Fix: bitwise_not misclassified as binary (is …
  • #17990 - [Relax]Fix: Output tensor with zero dimension after torch.u…
  • #17925 - [Relax][PyTorch] Re-enable test_subgraph_capture in dynamo test
  • #17980 - [ONNX] Make bias input optional in LayerNormalization
  • #17918 - [Relax][PyTorch] Add ReLU6 Op Support for Exported Program and FX graph
  • #17930 - [Relax][PyTorch] Add torch.outer Op Support for Exported Program and FX graph
  • #17932 - [Relax][PyTorch] Add UpSample Bicubic Op Support for Exported Program and FX graph
  • #17921 - [Relax][PyTorch] Add AvgPool 1D and 3D Op Support for Exported Program and FX graph
  • #17922 - [Relax][PyTorch] Add Adaptive AvgPool 1D and 3D Op Support for Exported Program and FX graph
  • #17863 - [Relax][PyTorch] CrossEntropyLoss
  • #17919 - [Relax][PyTorch] Add MaxPool 1D and 3D Op Support for Exported Program and FX graph
  • #17926 - [Relax][PyTorch] Add tests for all the dtypes supported in the PyTorch frontend
  • #17924 - [Relax][PyTorch] Add div.Tensor_mode and trunc Op Support for Exported Program and FX graph
  • #17904 - [Relax][PyTorch] Add Meshgrid Op Support for Exported Program and FX graph
  • #17915 - [Relax][PyTorch] Add support for linspace op in fx graph
  • #17886 - [Relax][PyTorch] Add Pixel Shuffle Op Support for Exported Program and FX graph
  • #17908 - [Relax][PyTorch] Add support for eye op in fx graph
  • #17893 - [Relax][Pytorch] Add fmod support
  • #17894 - [Relax][PyTorch] Support torch.bfloat16 dtype in pytorch frontend
  • #17878 - [Relax][PyTorch] Add torch.isin Op Support for Exported Program and FX graph
  • #17889 - [Relax][PyTorch] Support linspace op for ExportedProgram importer
  • #17868 - [Relax][Pytorch] Add support for ones_like, zero_, zeros, type_as, item ops
  • #17857 - [Relax][PyTorch] Refactor norm op for ExportedProgram importer
  • #17852 - [Relax][PyTorch] Sort.default
  • #17871 - [Relax][Pytorch] Add support for bitwise_or op support
  • #17836 - [Relax][PyTorch] support for index.Tensor
  • #17864 - [Relax][PyTorch] Support eye op for ExportedProgram importer
  • #17858 - [Relax][PyTorch] Add copy_ op support in fxGraph
  • #17851 - [Relax][PyTorch] Support leaky_relu_.default and reshape_as.default in ExportedProgram frontend
  • #17843 - [Relax][PyTorch] Add mul_.Tensor, max.default, min.default and pow.Scalar Op Support into Exported Program Frontend
  • #17821 - [Relax][PyTorch] Add Pad Op Support for Exported Program and FX graph
  • #17819 - [Relax][PyTorch] Add Stack Op Support for Exported Program
  • #17849 - [Relax][PyTorch] Add RSub Op Support for Exported Program and FX graph
  • #17850 - [Relax][Pytorch] Add masked_fill op support in ExportedProgram
  • #17816 - [Relax][PyTorch] Add PReLU Op Support for Exported Program and FX graph
  • #17803 - [Relax][PyTorch] Add Logaddexp op support for exported program
  • #17841 - [Relax][PyTorch] Add support for norm op
  • #17832 - [Relax][PyTorch] full.default, full_like.default, ones.default
  • #17830 - [Relax][PyTorch] Support narrow and broadcast_to ops for ExportedProgram importer

LLVM

  • #17859 - [Codegen] Enable SVE/VLA for RISCV targets
  • #17958 - Fix JIT unknown reloc issue for case of RISCV
  • #17954 - [FFI]Fix compilation errors with clang20

Metal

  • #18034 - Fix GetFunction of metal runtime

ROCm

  • #18029 - Fix ROCm build after FFI refactor

Relax

  • #18102 - Fix rotary embedding buffer size calculation
  • #17928 - [KVCache] Per Layer Sliding Window
  • #17840 - Refactor missing op check into shared utility for Torch frontends
  • #17826 - Fix Torch frontends to report all the missing ops

Runtime

  • #18097 - CutensorMap support

TIR

  • #18068 - Extend address_of to support Buffer objects
  • #18069 - Fix block access region detection for nested let bindings
  • #18057 - Phase out ProducerStore, ProducerRealize and Prefetch

TOPI

  • #18039 - [Relax] Support InstanceNorm & Bugfix of InstanceNorm
  • #18063 - [NN][Layer_Norm] Fix layer_norm error with reduce-only axes
  • #18006 - Fix index handling in expand_like operator for axis expansion
  • #18015 - Support integer type input for log10
  • #17942 - Add shape validation to prevent negative dimensions in conv operations

Vulkan

  • #18005 - Add TIR unary trigonometric/hyperbolic intrinsic definitions

cuda & cutlass & tensorrt

  • #18064 - [CUTLASS] Fix CUTLASS kernel build on Hopper
  • #18033 - [CUTLASS] Add GeMM kernels for Blackwell GPUs
  • #18024 - [CUDA] Fix thrust with latest FFI refactor
  • #18118 - bump cutlass_fpA_intB_gemm
  • #18113 - [CMake] Refine C++/CUDA standard settings in CMakeLists.txt

FFI

  • #18076 - [FFI][REFACTOR] Stablize container ABI and implementation
  • #18091 - [FFI] Provide Field Visit bridge so we can do gradual transition
  • #18095 - [FFI][REFACTOR] Migrate attrs to use new reflection
  • #18083 - [FFI] Update typeinfo to speedup parent reflection
  • #18077 - [FFI] Optimize atomic decref in Object
  • #18065 - [FFI] Introduce FFI reflection support in python
  • #18062 - [FFI][REFACTOR] Update registry to have complete meta-data
  • #18059 - [FFI][REFACTOR] Enhance reflection
  • #18050 - [FFI] Enhance FFI Object exception safety during init
  • #18121 - Revert "[FFI] Replace Arg2Str with a more powerful for_each"
  • #18117 - [FFI] Replace Arg2Str with a more powerful for_each
  • #18116 - [FFI] Use fold expression to simplify for_each
  • #18114 - [FFI] Replace __attribute__ with C++ standard attributes
  • #18112 - [FFI] Cleanup visit_attrs attribute after refactor
  • #18111 - [FFI] Introduce GlobalDef for function registration
  • #18106 - [REFACTOR][FFI] Phase out old VisitAttrs mechanism
  • #18042 - [REFACTOR][FFI] Update symbol name for library module
  • #18023 - [FFI] More strict tuple constructor checking
  • #18022 - [REFACTOR][FFI] Cleanup PackedFunc redirections
  • #18020 - [REFACTOR][PYTHON] Phase out tvm._ffi and Limited API support
  • #17979 - [FFI][REFACTOR] Update to distinguish as and cast
  • #17983 - [FFI][JVM] Upgrade tvm4j to latest FFI
  • #18010 - [REFACTOR][FFI] Phase out legacy C API
  • #17943 - [FFI] Variant specialize for all ObjectRef
  • #17939 - [REFACTOR] Phase out legacy rust ffi
  • #17940 - [REFACTOR] Phase out legacy go ffi
  • #17931 - [REFACTOR][FFI][RPC] Migrate RPC to use the latest FFI ABI
  • #17929 - [REFACTOR][FFI] Cleanup container redirections
  • #17927 - [FFI][FEAT] AutoDLPack for taking external tensor objects
  • #17923 - [REFACTOR][FFI] Cleanup PackedFunc related redirection
  • #17920 - [REFACTOR] Introduce and modernize ffi system

web

  • #17946 - [REFACTOR][FFI]Upgrade Web Runtime to new FFI
  • #17917 - [WebGPU][CodeGen] Override PrintVecElemLoad and Store for WebGPU

Misc

  • #18104 - Add LLVM Legalization for tir.erf
  • #18107 - fix: guard tensormap with cuda version check
  • #18101 - [REFACTOR] Formalize namespace for all objects
  • #18040 - Add support for bucketize
  • #18098 - [REFACTOR] Transition VisitAttrs to new reflection mechanism
  • #18096 - [REFACTOR] Transition VisitAttrs to new reflection mechanism in tir/ir_builder/meta_schedule
  • #18093 - [NVSHMEM] Extend CUDA backend to compile and link TIR modules with NVSHMEM
  • #18088 - [Script] Enhance alloc buffer handling in nested frames
  • #18086 - [SCRIPT] Bump Python minimum version to 3.9 and update AST compatibility
  • #18075 - add support for softsign op
  • #18079 - [Script] Add support for merging block annotations
  • #18080 - [REFACTOR] Phase out LegacyReprPrinter and improve CommonSubExprElim
  • #18078 - [REFACTOR] Phase out the RelaxExpr.checked_type in favor of struct_info
  • #18073 - [NVSHMEM] Update NDArray allocation
  • #18066 - [Script] Remove deprecated attributes from Constant AST node
  • #18060 - Add Python functor support for TIR expressions and statements
  • #18054 - [Pytest] Remove obsolete test suite entries
  • #18036 - Add support for hamming_window op
  • #18049 - [Refactor] Rename relax_vm to vm
  • #18046 - [3rdparty] Phasing out FlashInfer AOT from 3rdparty
  • #18047 - [Backend] JIT compile FlashInfer kernel with FFI header
  • #18041 - [DTYPE] Fix dtype functions after dtype refactor
  • #18043 - [REFACTOR] Phase out the relax tuning_api
  • #18038 - Resolving inconsistency between attention/attention_bias
  • #18027 - [Dtype] Low-precision Blackwell Datatype Support
  • #17985 - [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends
  • #17978 - Fix IR generation conflict in topi.nn.simplify by separating Tensor and PrimExpr handling
  • #18026 - [Python] Fix library lookup path for pip installed packages
  • #18019 - Add op support for slice_scatter
  • #17974 - Fix FLOP estimation for EvaluateNode by implementing VisitStmt_ handler
  • #18013 - Fix RuntimeError: parallel_for_dynamic
  • #18014 - Fix division truncation in window size calculation for small dtypes in average_pool
  • #17995 - Fix zero-extent loops in PerStoreFeature to prevent crashes
  • #17969 - Add registion for the operator asinh, acosh, atanh in llvm
  • #17972 - Fix g.costs
  • #17953 - Fix sqrt/rsqrt Compatibility with Integer Data Types
  • #17961 - Fix basic FLOP estimation for WhileNode
  • #17945 - Add registion for the operator asin and acos in llvm
  • #17951 - [NODE] Fix structural equality for Array specialization
  • #17913 - [Triton] Support latest triton.compile interface
  • #17911 - Add op support for new_zeros op in Exported Program and fx graph frontend
  • #17909 - Add masked_fill_.scalar, logical_not.default in Exported Program frontend
  • #17910 - [RPC] Fix Bug That Change Dict When Iterate The Keys
  • #17896 - Add op support for zeros_like and fill_
  • #17900 - Fix onnx expand op
  • #17865 - Add support for index_put_ op
  • #17839 - Add op support for roll op
  • #17844 - Fix incorrect docstring in topi softmax
  • #17831 - [3rdparty] Bump DLPack to v1.1 for float8/6/4 dtype supports
  • #17848 - Fix docstring in batch_to_space_nd and bitpack
  • #17845 - fixing incorrect docstring in upsampling.py
  • #17808 - [Install] Fix error during python/tvm installation

Don't miss a new tvm release

NewReleases is sending notifications on new releases.