github apache/tvm v0.13.0
Apache TVM v0.13.0

latest releases: v0.17.0.rc0, v0.17.0, v0.18.dev0...
pre-release14 months ago


The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFC;
  • Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
  • Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
  • Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
  • microTVM, AOT, TVMC, LLVM;
  • CI, BugFix, Docs, Docker, Miscs;

Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.


  • #15086 - Aleksei-grovety -> Reviewer
  • #14676 - Jiajun Jiang -> Reviewer
  • #14677 - Qiang Zhang -> Reviewer
  • #14622 - Sunghyun Park -> Reviewer
  • #14578 - Zihao Ye -> Committer
  • #14853 - Anirudh Sundar Subramaniam -> Committer
  • #14772 - Add new key for release signing



  • #14830 - Use f-strings for string formatting, NFC
  • Keras
    • #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
    • #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
    • #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
    • #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
    • #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
    • #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
    • #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
  • Paddle
    • #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
    • #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
    • #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
    • #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
  • TFLite
    • #14667 - [TFLite]Support for quantized squared difference
    • #14819 - [TFLite]Generate name when tensor name is missing
    • #15173 - [FRONTEND][TFLITE]Fix int16 transpose conv loading
  • TensorFlow
    • #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
  • PyTorch
    • #14747 - [PyTorch] Add aten::new_zeros
    • #14699 - [Torch] fix typo in new_full
    • #14963 - [PyTorch] Support use_input_stats in instance_norm
    • #14930 - Fix pytorch axis
  • ONNX
    • #15017 - [ONNX] Fix bug in scatter_elements


  • #15182 - Add weak symbol to builtin fp16
  • #15161 - Clean TVM stacktrace in error messages
  • #15162 - Support void as dtype in FFI
  • #14902 - Update Module and Registry to use String Container
  • #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
  • #14887 - Make systemlib unique per prefix
  • #14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
  • #14656 - Fix Can't "query_imports" Bug of VM Executable


  • #15061 - [TOPI]Fix problem with ceil_log2
  • #14996 - [OpenCL]Fix conv2d when output channels < 4


  • #15059 - Update CMSIS-NN release to v4.1.0


  • #14972 - [OPENCL] Always use convert_T for type conversion
  • #14995 - [OpenCL] Improve diagnostic message
  • #14833 - [Codegen][OpenCL] fix amibiguous selection operator call
  • #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
  • #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
  • #14949 - [CodegenC] Updated unit test for sorted CodegenC output
  • #14767 - [OpenCLML] Transposed convolution support and other fixes

cuda & cutlass & tensorrt

  • #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
  • #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
  • #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM


  • #14962 - Fix int8 vectorized cast
  • #14846 - Fix vectorized select
  • #14727 - Update metal runtime to directly store kernel map
  • #14671 - Fix flaky memory issue due to racing


  • #15035 - [Vulkan] Allow DeclBuffer in CodeGenSPIRV
  • #14817 - [Vulkan] Add cooperative matrix support


  • #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
  • #14948 - Update instructions to compile hexagon runtime
  • #14965 - Add support for v73, make v68 default
  • #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
  • #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit


  • #15106 - [TensorIR]AMD Matrix Core Support
  • #15088 - [Target]Replace rocm arch parsing from int to string


  • #14872 - Use self.close_transport() on error


  • #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
  • #15032 - Remove duplication in tvm.testing.aot.compile_models
  • #14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName


  • #15159 - [microNPU][ETHOSU] Fix compiler attributes types
  • #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
  • #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
  • #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
  • #15114 - [microNPU] Upgrade Vela to v3.8.0
  • #15104 - [microNPU][ETHOSU] Fix minimum buffer size
  • #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
  • #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
  • #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
  • #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
  • #14629 - [microNPU][ETHOSU] Softmax int8 legalization support
  • #14353 - [microNPU] Add support for MEAN with uint8 ifm
  • #14587 - [microNPU] Fix skip tests when Vela is not present
  • #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass


  • #15046 - Add GEMM kernel from FasterTransformer as submodule
  • #15029 - Hide internal cutlass symbols


  • #15068 - Improve the "clip" op optimization in simplify expr pass
  • #14925 - add a dimension check to reject invalid input
  • #14858 - [simplify_expr]: Add pass to remove trivial transpose ops
  • #14838 - Use f-strings for string formatting, NFC
  • #14831 - [Relay/Op] Use f-strings for string formatting, NFC
  • #14580 - Simplify the square of a binomial
  • #14735 - Handle pad value coming from Tensor instead of scalar
  • #14601 - Enhance type infer for dynamic shape
  • #14885 - [Relay] fix broadcast in PyTorch frontend
  • #15090 - [Relay] Insertion of "device_copy" CallNode to Resolve Device Conflict on Unconstrained Nodes
  • #14845 - [Relay] Fix softplus in paddlepaddle frontend
  • #14837 - [Relay] Fix AdaptiveAvgPool2d about wrong dtype prasing
  • #14821 - [Relay] Fix softplus about the wrong calculation formula in Relay PyTorch frontend
  • #14820 - [Relay] Fix threshold calculation logic in PyTorch frontend
  • #14824 - [Relay] fix a bug about ReLu in the threshold attribute which causes a different results with keras
  • #14796 - [relay] fix wrong calculate logic about celu
  • #14773 - [Relay] fix scatter_nd type relation
  • #14742 - [relay] Fix alpha attribute with None in ELU
  • #14740 - [Relay] Fix stride in LpPool for default
  • #14556 - [Relay] fix a bug caused by IncompleteTypeNode in EinsumRel while doing MergeComposite
  • #15057 - [QNN] Implement quantized avg_pool2d
  • #14536 - [QNN] Implement 'qnn.softmax'
  • #14875 - [Quantization]: Update simulated_quantize to infer correct layout


  • #15018 - Fix dynamic dimensions support for Dense on TOPI side
  • #14856 - Fix in interpretation of empty axis parameter in reduction fun…
  • #14483 - [Target] Add SVE specific convolution
  • #14839 - Use f-strings for string formatting, NFC
  • #14822 - Use f-strings for string formatting, NFC
  • #14519 - Vectorize depthwise conv2d output operator
  • #14549 - remove the i32 cast for output shape of pool
  • #14566 - [Topi] Output strides in pack_buffer() utility


  • #15131 - Hotfix flaky test in padded matmul
  • #15120 - NormalizeToIterSum
  • #15081 - Improve arith simplify to handle symbolic reshape pattern
  • #14532 - Implement statistics counters for RewriteSimplifier
  • #14704 - [cherry-pick][BUGFIX] Fix a bug of iter map floormod(x,2) simplify
  • #14849 - [TVMScript] Capture fails if var appears only in annotation
  • #14596 - [TensorIR] Improve CompactBufferRegion for symbolic shape
  • #15129 - [TIR] Recognize empty extents
  • #14982 - [TIR][VTA] Update host-side target, even without device func
  • #14547 - Enhance IterMapSimplify for symbolic
  • #14571 - [BUGFIX] Fix a bug of iter map floormod(x,2) simplify
  • #14582 - Fix solve inequality of unbound var ranges
  • #14538 - Enhance CanonicalSimplify to Simplify ProdDiv


  • #14781 - [MetaSchedule] RPC port needs to be an integer
  • #14673 - Introduce MMA Tensor Core Multilevel Tiling
  • #14784 - Enhance tune_tir to tune IRModule of TIR Collections
  • #14783 - Add an API to dump a pruned database
  • #14785 - Clear screen only when specified
  • #14654 - Handle output cases for InlineConstantScalars
  • #14642 - PostProc not rewriting unroll for purely spatial block
  • #14591 - Handle cases when no features found by FeatureExtractor
  • #14584 - [ARM] Beautification of the function names


  • #15153 - [TensorIR][Visitor] Visit buffer members in match_buffer's in block visitor functions
  • #15168 - [Schedule] Support padding-by-factor in PadEinsum
  • #15165 - Expose UndefinedVars to Python
  • #15163 - Fix RenewDef for symbolic input shapes
  • #15142 - [Schedule] Enhance compute-inline for fusion
  • #15150 - Fix typo in code example
  • #15144 - [TensorIR][Schedule] New schedule primitive unsafe_hide_buffer_access
  • #15146 - Block dependence analysis without schedules
  • #15119 - Avoid duplicate GlobalVar names in SplitHostDevice
  • #15037 - Handle DeclBuffer in CacheReadWrite schedule primitive
  • #15098 - [Ethos-U]Handle DeclBuffer in Ethos-U inputs
  • #15044 - [USMP] Preserve DeclBuffer in PoolAllocationToOffsetConverter
  • #15078 - Handle DeclBuffer in LowerThreadAllreduce
  • #15094 - Handle DeclBuffer in MergeDynamicSharedMemoryAllocations
  • #15093 - Handle DeclBuffer in StorageAccessInfoLower
  • #15045 - Handle DeclBuffer in InjectDoubleBuffer
  • #15096 - Handle DeclBuffer in RemoveNoOp
  • #15076 - [CodeGen] Define PackedFunc error code in MakePackedAPI
  • #15102 - Update primfunc host attachment to include host
  • #14854 - [Compute-at] Enable complex floordiv/floormod expressions in compute_at
  • #15041 - Handle DeclBuffer in LowerCustomDatatypes
  • #15038 - Handle DeclBuffer in Inline/ComputeAt/ReverseComputeAt
  • #15052 - [Analysis] Handle DeclBuffer in FlopEstimator
  • #15051 - Handle DeclBuffer in StorageRewrite
  • #15050 - [Schedule] Fix decompose_padding bug with dtypes
  • #15034 - Refactor BlockScope outside schedule
  • #15054 - Handle DeclBuffer in IRSubstitute
  • #14986 - Move SplitHostDevice to before MakePackedAPI
  • #15042 - Handle DeclBuffer in StorageFlatten's input
  • #15040 - Preserve object equality in Buffer::GetFlattenedBuffer
  • #14693 - Enhance TVMScript Buffer Slice Access
  • #14988 - Handle callees on same target, different codegen
  • #14951 - Keep trivial LetStmt in tir.Simplify when used in buffer decl
  • #14944 - Restrict tir.transform.LowerTVMBuiltin to host functions
  • #14990 - [IR,TE,TIR] Use f-strings for string formatting, NFC
  • #14993 - Fix incorrect construction of block frames
  • #14952 - Avoid re-defining var = arg_var in ArgBinder
  • #14918 - SplitHostDevice, handle subroutines
  • #14943 - Restrict tir.transform.InstallDebugSpans to host functions
  • #14942 - Preserve existing kTarget function attribute in BindTarget
  • #14945 - Restrict tir.transform.CombineContextCall to host functions
  • #14914 - Handle subroutine calls in MakeUnpackedAPI
  • #14913 - Handle subroutine calls in MakePackedAPI
  • #14892 - Expand unit tests for ConvertSSA
  • #14866 - Avoid too complex predicate in compaction
  • #14766 - [Schedule] Improve blockize to support blockizing multiple blocks
  • #14776 - Improved parameter name in DLTensor unpacking error messages
  • #14562 - [Driver] Move ShouldAnnotateEntryFunc logic into transform
  • #14741 - Keep block annotations from tensorization
  • #14021 - More flexible buffer compaction
  • #14711 - [Analysis] Calculate allocated memory at module level
  • #14492 - Flatten SeqStmt on construction
  • #14598 - Add CUDA int4 tensor core intrinsics
  • #14593 - [Schedule] Method returning the function being worked on
  • #14592 - [TensorIR] Fix ComputeAt with perfect symbolic bound
  • #14491 - Use String instead of StringImm for AttrStmtNode::node
  • #14626 - [TensorIR]reindex_cache_write do not mutate init statement
  • #14588 - [Fix][TIR] UnifyThreadBinding creating unit loop with annotation
  • #14589 - [Fix][TIR][Analysis] Reduction block checking alloc_buffers


  • #15083 - Avoid visiting repetition tensor in SetCommonPrefix Visitor
  • #15091 - [TIR]Convert tir.op operands to PrimExpr
  • #14919 - [TIR] Parse subroutine calls with no arguments
  • #14941 - Prevent bool to int conversion in T.Assert condition
  • #14915 - Allow"device", host="host") to specify host
  • #14900 - Round-trip DeclBuffer with undefined data pointer
  • #14889 - [TIR]Added format/parsing of subroutine calls
  • #14874 - Use default fallback for un-registered type
  • #14840 - Print Executor, Runtime, and FunctionInfo as metadata
  • #14812 - Handle AllocatedPoolInfo, ConstantPoolInfo, ConstantInfo
  • #14786 - Add __name__ attr for parsed PrimFunc and IRModule
  • #14531 - Preserve LetStmt of constants
  • #14488 - Distinguish between void* and handle


  • #14994 - [Bugfix]Fix tvmc option for printing which operators are offloaded to the Ethos-U


  • #15127 - Remove the "ret_void" argument of AddFunction
  • #15139 - Minor refactor to LLVMModuleNode::SaveToFile
  • #14958 - [Codegen]Allow void return type from PackedFunc
  • #14946 - Expose Host CPU Feature Detection
  • #14901 - Codegen subroutine call when CallNode::op is GlobalVar
  • #14570 - Use Var annotation in LetStmt for pointer type
  • #14843 - [RUNTIME] Enable multi systemlib with device code
  • #14564 - Validate generated LLVM module before optimization
  • #14568 - Expand tvm::Type to DWARF conversion
  • #14563 - [Codegen]Remove cast to i8* in builtin::address_of


  • #14960 - [Bug] Add typing_extensions requirement again
  • #15015 - [Hotfix] Remove LOG(INFO) from unsupported dtype legalization pass
  • #14991 - Make ThreadAllReduce pass compatible with int64
  • #14950 - Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI
  • #14903 - [Test Cases]Add some version check to make test cases run in all PyTorch versions
  • #14890 - [Fix] Fix typo in error message
  • #14879 - fix the undeclared identifier 'f'
  • #14857 - Fix batch_norm
  • #14787 - [FIX] fix typo in comment


  • #15179 - [Testing] Utility method to run TVM on remote device
  • #15138 - [Test] Improve check for TVMError exception in test_cast
  • #15062 - Clone submodule recursively
  • #15065 - Revert "Make Graviton3 default AArch64 job runner node (#14983)"
  • #14983 - Make Graviton3 default AArch64 job runner node
  • #15056 - [Bugfix]Fix CacheControl version constraint violation
  • #14908 - Update the expected CI jobs list in the update_branch script
  • #14847 - Update CPU image to install PyTorch
  • #14808 - [Testing] Use TVMScript's "name" argument for error messages
  • #14780 - fix doc deploy issue
  • #14651 - Modify test cases to accommodate the CI upgrades
  • #14666 - sccache support while using under multi user environments
  • #14635 - Upgrade CI
  • #14713 - Add PLATFORM env var to builds
  • #14680 - Downgrade ci_cpu llvm version back to 11
  • #14653 - [tests][scripts][release] Optimize release note script about categories etc
  • #14646 - [test][script] Fix release of script about ghost users or blank PR nodes
  • #14550 - Add JAX deps in Dockerfiles
  • #14466 - Update ci_cpu image and build with llvm-15


  • #15149 - Fix environment variables
  • #15105 - Update docker images for llvm-16
  • #15092 - Update ci-cortexm docker image to contain CMSIS-NN release v…
  • #15095 - Add environment variables
  • #15067 - Migrate arm docker image to use llvm packages
  • #15031 - Update ci_cpu docker image to one containing polly package f…
  • #15003 - [ADRENO] Docker setup changes for multi user environments
  • #14912 - Add polly package
  • #14842 - Install PyTorch on cpu image
  • #14590 - Support rootless docker when using docker/


  • #15126 - [DOC] Add RPC System Setup Document
  • #15071 - Updated the copyright year from 2020 to 2023
  • #15055 - [DOC][TUTORIAL] Fix typo for the 'Making your Hardware Accelerator TVM-ready with UMA'
  • #14504 - [TensorIR][Doc] Docstring of reorder_block_iter_var
  • #14611 - [TIR] Fix unsafe_set_dtype docstring
  • #14585 - Fix typo in the Vitis AI Integration docs


  • #15267 - [release] Disable git merge to avoid conflict
  • #15187 - [RPC] Report RPC Session Timeout to Client Instead of "kShutdown"
  • #15185 - Update tvm_runtime.h
  • #15164 - [CMake] Support LLVM-16 static linking
  • #15167 - [Python] Enhance Wheel Packaging
  • #15166 - [Target] Add MetaSchedule-compatible attributes to OpenCL
  • #15154 - [Minor] Fix Compilation Warnings
  • #15132 - [NDArray] Allow creating a view from a strided array
  • #15116 - [RPC] Add Missing Option "port_end" to RPC Proxy
  • #15073 - [CodeGenC] Use PrimFuncNode::ret_type in function signature
  • #15036 - [StackVM] Updated CodeGenStackVM to handle DeclBuffer
  • #15022 - [Build] Fix missing virtual destructor in SIBuilder
  • #15016 - Fix type parse error about AdaptiveMaxPool
  • #15007 - [Minor] Fix compilation warnings
  • #15000 - [CMAKE] Introduce dummy build as an option
  • #14863 - [DataType] Initial support of fp8 (e4m3/e5m2)
  • #14975 - [CMAKE] Add a dummy target to defer libtvm dep
  • #14574 - [IR][SIBuilder]
  • #14939 - [Target] Add target to all TVM callbacks
  • #14937 - [BUILD] Enable log before throw message in windows
  • #14934 - [TestCases] fix unreachable test cases due to outside the for-loop
  • #14916 - [TypoFix] fix some typo problem in keras frontend
  • #14893 - [Contrib] Use f-strings for string formatting, NFC
  • #14884 - [AutoTVM] Use f-strings for string formatting, NFC
  • #14876 - [CONTRIB] Enable create_staticlib to take in tar files
  • #14867 - Fix f-string typo
  • #14851 - Add v0.12.0 docs
  • #14813 - [BUILD] Removed the duplicated MACROs in config.cmake
  • #14743 - [SUPPORT] Fix RingBuffer ReadWithCallback
  • #14799 - [LINT] Fix clang-format script for newest clang-format
  • #14797 - [NDArray] Allow arbitrary stride when the corresponding shape is 1
  • #14790 - More clear ref of thirdparty license
  • #14779 - fix: use arm on demand instead of spot
  • #14762 - [Target][Minor] Add A6000 Target Tag
  • #14683 - [AutoTVM] Added Droplet algorithm in TVM
  • #14694 - unify search path approach to various libs
  • #14686 - [CMAKE] Update search pattern of config
  • #14636 - Fix bug about wrong attribute name
  • #14628 - [CODEGEN] Fix metal codegen when with only single working dim
  • #14607 - fix: deploy ci
  • #14569 - [Node] Allow alternative root names in ObjectPath::Root()
  • #14522 - [Object] Implemented .as for ObjectRef param, returns Optional
  • #14477 - feat: use spot instances for ci with on demand as a backup
  • #14468 - [AutoTVM] New rank-binary loss_type for the new xgboost >= 2.0.0 behaviour
  • #14544 - Update to v0.13.dev0
  • #14539 - [Target] Add Apple M1 GPU tag with 256-thread restriction

Don't miss a new tvm release

NewReleases is sending notifications on new releases.