Introduction

The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

Community, RFC;
Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
microTVM, AOT, TVMC, LLVM;
CI, BugFix, Docs, Docker, Miscs;

Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.

Community

#15086 - Aleksei-grovety -> Reviewer
#14676 - Jiajun Jiang -> Reviewer
#14677 - Qiang Zhang -> Reviewer
#14622 - Sunghyun Park -> Reviewer
#14578 - Zihao Ye -> Committer
#14853 - Anirudh Sundar Subramaniam -> Committer
#14772 - Add new key for release signing

RFC

apache/tvm-rfcs#100

Frontend

#14830 - Use f-strings for string formatting, NFC
Keras
- #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
- #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
- #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
- #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
- #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
- #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
- #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
Paddle
- #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
- #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
- #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
- #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
TFLite
- #14667 - [TFLite]Support for quantized squared difference
- #14819 - [TFLite]Generate name when tensor name is missing
- #15173 - [FRONTEND][TFLITE]Fix int16 transpose conv loading
TensorFlow
- #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
PyTorch
- #14747 - [PyTorch] Add aten::new_zeros
- #14699 - [Torch] fix typo in new_full
- #14963 - [PyTorch] Support use_input_stats in instance_norm
- #14930 - Fix pytorch axis
ONNX
- #15017 - [ONNX] Fix bug in scatter_elements

Runtime

#15182 - Add weak symbol to builtin fp16
#15161 - Clean TVM stacktrace in error messages
#15162 - Support void as dtype in FFI
#14902 - Update Module and Registry to use String Container
#14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
#14887 - Make systemlib unique per prefix
#14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
#14656 - Fix Can't "query_imports" Bug of VM Executable

Adreno

#15061 - [TOPI]Fix problem with ceil_log2
#14996 - [OpenCL]Fix conv2d when output channels < 4

CMSIS-NN

#15059 - Update CMSIS-NN release to v4.1.0

OpenCL & CLML

#14972 - [OPENCL] Always use convert_T for type conversion
#14995 - [OpenCL] Improve diagnostic message
#14833 - [Codegen][OpenCL] fix amibiguous selection operator call
#14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
#14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
#14949 - [CodegenC] Updated unit test for sorted CodegenC output
#14767 - [OpenCLML] Transposed convolution support and other fixes

cuda & cutlass & tensorrt

#14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
#14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
#14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM

metal

#14962 - Fix int8 vectorized cast
#14846 - Fix vectorized select
#14727 - Update metal runtime to directly store kernel map
#14671 - Fix flaky memory issue due to racing

Vulkan

#15035 - [Vulkan] Allow DeclBuffer in CodeGenSPIRV
#14817 - [Vulkan] Add cooperative matrix support

Hexagon

#14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
#14948 - Update instructions to compile hexagon runtime
#14965 - Add support for v73, make v68 default
#14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
#14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit

ROCm

#15106 - [TensorIR]AMD Matrix Core Support
#15088 - [Target]Replace rocm arch parsing from int to string

microTVM

#14872 - Use self.close_transport() on error

AOT

#15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
#15032 - Remove duplication in tvm.testing.aot.compile_models
#14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName

micoNPU

#15159 - [microNPU][ETHOSU] Fix compiler attributes types
#15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
#15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
#15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
#15114 - [microNPU] Upgrade Vela to v3.8.0
#15104 - [microNPU][ETHOSU] Fix minimum buffer size
#15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
#14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
#14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
#14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
#14629 - [microNPU][ETHOSU] Softmax int8 legalization support
#14353 - [microNPU] Add support for MEAN with uint8 ifm
#14587 - [microNPU] Fix skip tests when Vela is not present
#14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass

BYOC

#15046 - Add GEMM kernel from FasterTransformer as submodule
#15029 - Hide internal cutlass symbols

Relay

#15068 - Improve the "clip" op optimization in simplify expr pass
#14925 - add a dimension check to reject invalid input
#14858 - [simplify_expr]: Add pass to remove trivial transpose ops
#14838 - Use f-strings for string formatting, NFC
#14831 - [Relay/Op] Use f-strings for string formatting, NFC
#14580 - Simplify the square of a binomial
#14735 - Handle pad value coming from Tensor instead of scalar
#14601 - Enhance type infer for dynamic shape
#14885 - [Relay] fix broadcast in PyTorch frontend
#15090 - [Relay] Insertion of "device_copy" CallNode to Resolve Device Conflict on Unconstrained Nodes
#14845 - [Relay] Fix softplus in paddlepaddle frontend
#14837 - [Relay] Fix AdaptiveAvgPool2d about wrong dtype prasing
#14821 - [Relay] Fix softplus about the wrong calculation formula in Relay PyTorch frontend
#14820 - [Relay] Fix threshold calculation logic in PyTorch frontend
#14824 - [Relay] fix a bug about ReLu in the threshold attribute which causes a different results with keras
#14796 - [relay] fix wrong calculate logic about celu
#14773 - [Relay] fix scatter_nd type relation
#14742 - [relay] Fix alpha attribute with None in ELU
#14740 - [Relay] Fix stride in LpPool for default
#14556 - [Relay] fix a bug caused by IncompleteTypeNode in EinsumRel while doing MergeComposite
#15057 - [QNN] Implement quantized avg_pool2d
#14536 - [QNN] Implement 'qnn.softmax'
#14875 - [Quantization]: Update simulated_quantize to infer correct layout

TOPI

#15018 - Fix dynamic dimensions support for Dense on TOPI side
#14856 - Fix in interpretation of empty axis parameter in reduction fun…
#14483 - [Target] Add SVE specific convolution
#14839 - Use f-strings for string formatting, NFC
#14822 - Use f-strings for string formatting, NFC
#14519 - Vectorize depthwise conv2d output operator
#14549 - remove the i32 cast for output shape of pool
#14566 - [Topi] Output strides in pack_buffer() utility

Arith

#15131 - Hotfix flaky test in padded matmul
#15120 - NormalizeToIterSum
#15081 - Improve arith simplify to handle symbolic reshape pattern
#14532 - Implement statistics counters for RewriteSimplifier
#14704 - [cherry-pick][BUGFIX] Fix a bug of iter map floormod(x,2) simplify
#14849 - [TVMScript] Capture fails if var appears only in annotation
#14596 - [TensorIR] Improve CompactBufferRegion for symbolic shape
#15129 - [TIR] Recognize empty extents
#14982 - [TIR][VTA] Update host-side target, even without device func
#14547 - Enhance IterMapSimplify for symbolic
#14571 - [BUGFIX] Fix a bug of iter map floormod(x,2) simplify
#14582 - Fix solve inequality of unbound var ranges
#14538 - Enhance CanonicalSimplify to Simplify ProdDiv

MetaSchedule

#14781 - [MetaSchedule] RPC port needs to be an integer
#14673 - Introduce MMA Tensor Core Multilevel Tiling
#14784 - Enhance tune_tir to tune IRModule of TIR Collections
#14783 - Add an API to dump a pruned database
#14785 - Clear screen only when specified
#14654 - Handle output cases for InlineConstantScalars
#14642 - PostProc not rewriting unroll for purely spatial block
#14591 - Handle cases when no features found by FeatureExtractor
#14584 - [ARM] Beautification of the function names

TIR

#15153 - [TensorIR][Visitor] Visit buffer members in match_buffer's in block visitor functions
#15168 - [Schedule] Support padding-by-factor in PadEinsum
#15165 - Expose UndefinedVars to Python
#15163 - Fix RenewDef for symbolic input shapes
#15142 - [Schedule] Enhance compute-inline for fusion
#15150 - Fix typo in code example
#15144 - [TensorIR][Schedule] New schedule primitive unsafe_hide_buffer_access
#15146 - Block dependence analysis without schedules
#15119 - Avoid duplicate GlobalVar names in SplitHostDevice
#15037 - Handle DeclBuffer in CacheReadWrite schedule primitive
#15098 - [Ethos-U]Handle DeclBuffer in Ethos-U inputs
#15044 - [USMP] Preserve DeclBuffer in PoolAllocationToOffsetConverter
#15078 - Handle DeclBuffer in LowerThreadAllreduce
#15094 - Handle DeclBuffer in MergeDynamicSharedMemoryAllocations
#15093 - Handle DeclBuffer in StorageAccessInfoLower
#15045 - Handle DeclBuffer in InjectDoubleBuffer
#15096 - Handle DeclBuffer in RemoveNoOp
#15076 - [CodeGen] Define PackedFunc error code in MakePackedAPI
#15102 - Update primfunc host attachment to include host
#14854 - [Compute-at] Enable complex floordiv/floormod expressions in compute_at
#15041 - Handle DeclBuffer in LowerCustomDatatypes
#15038 - Handle DeclBuffer in Inline/ComputeAt/ReverseComputeAt
#15052 - [Analysis] Handle DeclBuffer in FlopEstimator
#15051 - Handle DeclBuffer in StorageRewrite
#15050 - [Schedule] Fix decompose_padding bug with dtypes
#15034 - Refactor BlockScope outside schedule
#15054 - Handle DeclBuffer in IRSubstitute
#14986 - Move SplitHostDevice to before MakePackedAPI
#15042 - Handle DeclBuffer in StorageFlatten's input
#15040 - Preserve object equality in Buffer::GetFlattenedBuffer
#14693 - Enhance TVMScript Buffer Slice Access
#14988 - Handle callees on same target, different codegen
#14951 - Keep trivial LetStmt in tir.Simplify when used in buffer decl
#14944 - Restrict tir.transform.LowerTVMBuiltin to host functions
#14990 - [IR,TE,TIR] Use f-strings for string formatting, NFC
#14993 - Fix incorrect construction of block frames
#14952 - Avoid re-defining var = arg_var in ArgBinder
#14918 - SplitHostDevice, handle subroutines
#14943 - Restrict tir.transform.InstallDebugSpans to host functions
#14942 - Preserve existing kTarget function attribute in BindTarget
#14945 - Restrict tir.transform.CombineContextCall to host functions
#14914 - Handle subroutine calls in MakeUnpackedAPI
#14913 - Handle subroutine calls in MakePackedAPI
#14892 - Expand unit tests for ConvertSSA
#14866 - Avoid too complex predicate in compaction
#14766 - [Schedule] Improve blockize to support blockizing multiple blocks
#14776 - Improved parameter name in DLTensor unpacking error messages
#14562 - [Driver] Move ShouldAnnotateEntryFunc logic into transform
#14741 - Keep block annotations from tensorization
#14021 - More flexible buffer compaction
#14711 - [Analysis] Calculate allocated memory at module level
#14492 - Flatten SeqStmt on construction
#14598 - Add CUDA int4 tensor core intrinsics
#14593 - [Schedule] Method returning the function being worked on
#14592 - [TensorIR] Fix ComputeAt with perfect symbolic bound
#14491 - Use String instead of StringImm for AttrStmtNode::node
#14626 - [TensorIR]reindex_cache_write do not mutate init statement
#14588 - [Fix][TIR] UnifyThreadBinding creating unit loop with annotation
#14589 - [Fix][TIR][Analysis] Reduction block checking alloc_buffers

TVMScript

#15083 - Avoid visiting repetition tensor in SetCommonPrefix Visitor
#15091 - [TIR]Convert tir.op operands to PrimExpr
#14919 - [TIR] Parse subroutine calls with no arguments
#14941 - Prevent bool to int conversion in T.Assert condition
#14915 - Allow T.target("device", host="host") to specify host
#14900 - Round-trip DeclBuffer with undefined data pointer
#14889 - [TIR]Added format/parsing of subroutine calls
#14874 - Use default fallback for un-registered type
#14840 - Print Executor, Runtime, and FunctionInfo as metadata
#14812 - Handle AllocatedPoolInfo, ConstantPoolInfo, ConstantInfo
#14786 - Add __name__ attr for parsed PrimFunc and IRModule
#14531 - Preserve LetStmt of constants
#14488 - Distinguish between void* and handle

TVMC

#14994 - [Bugfix]Fix tvmc option for printing which operators are offloaded to the Ethos-U

LLVM

#15127 - Remove the "ret_void" argument of AddFunction
#15139 - Minor refactor to LLVMModuleNode::SaveToFile
#14958 - [Codegen]Allow void return type from PackedFunc
#14946 - Expose Host CPU Feature Detection
#14901 - Codegen subroutine call when CallNode::op is GlobalVar
#14570 - Use Var annotation in LetStmt for pointer type
#14843 - [RUNTIME] Enable multi systemlib with device code
#14564 - Validate generated LLVM module before optimization
#14568 - Expand tvm::Type to DWARF conversion
#14563 - [Codegen]Remove cast to i8* in builtin::address_of

BugFix

#14960 - [Bug] Add typing_extensions requirement again
#15015 - [Hotfix] Remove LOG(INFO) from unsupported dtype legalization pass
#14991 - Make ThreadAllReduce pass compatible with int64
#14950 - Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI
#14903 - [Test Cases]Add some version check to make test cases run in all PyTorch versions
#14890 - [Fix] Fix typo in error message
#14879 - fix the undeclared identifier 'f'
#14857 - Fix batch_norm
#14787 - [FIX] fix typo in comment

CI

#15179 - [Testing] Utility method to run TVM on remote device
#15138 - [Test] Improve check for TVMError exception in test_cast
#15062 - Clone submodule recursively
#15065 - Revert "Make Graviton3 default AArch64 job runner node (#14983)"
#14983 - Make Graviton3 default AArch64 job runner node
#15056 - [Bugfix]Fix CacheControl version constraint violation
#14908 - Update the expected CI jobs list in the update_branch script
#14847 - Update CPU image to install PyTorch
#14808 - [Testing] Use TVMScript's "name" argument for error messages
#14780 - fix doc deploy issue
#14651 - Modify test cases to accommodate the CI upgrades
#14666 - sccache support while using ci.py under multi user environments
#14635 - Upgrade CI
#14713 - Add PLATFORM env var to builds
#14680 - Downgrade ci_cpu llvm version back to 11
#14653 - [tests][scripts][release] Optimize release note script about categories etc
#14646 - [test][script] Fix release gather_pr.py of script about ghost users or blank PR nodes
#14550 - Add JAX deps in Dockerfiles
#14466 - Update ci_cpu image and build with llvm-15

Docker

#15149 - Fix build.sh environment variables
#15105 - Update docker images for llvm-16
#15092 - Update ci-cortexm docker image to contain CMSIS-NN release v…
#15095 - Add build.sh environment variables
#15067 - Migrate arm docker image to use llvm packages
#15031 - Update ci_cpu docker image to one containing polly package f…
#15003 - [ADRENO] Docker setup changes for multi user environments
#14912 - Add polly package
#14842 - Install PyTorch on cpu image
#14590 - Support rootless docker when using docker/bash.sh

Docs

#15126 - [DOC] Add RPC System Setup Document
#15055 - [DOC][TUTORIAL] Fix typo for the 'Making your Hardware Accelerator TVM-ready with UMA'
#14504 - [TensorIR][Doc] Docstring of reorder_block_iter_var
#14611 - [TIR] Fix unsafe_set_dtype docstring
#14585 - Fix typo in the Vitis AI Integration docs

Misc

#15267 - [release] Disable git merge to avoid conflict
#15187 - [RPC] Report RPC Session Timeout to Client Instead of "kShutdown"
#15185 - Update tvm_runtime.h
#15164 - [CMake] Support LLVM-16 static linking
#15167 - [Python] Enhance Wheel Packaging
#15166 - [Target] Add MetaSchedule-compatible attributes to OpenCL
#15154 - [Minor] Fix Compilation Warnings
#15132 - [NDArray] Allow creating a view from a strided array
#15116 - [RPC] Add Missing Option "port_end" to RPC Proxy
#15073 - [CodeGenC] Use PrimFuncNode::ret_type in function signature
#15036 - [StackVM] Updated CodeGenStackVM to handle DeclBuffer
#15022 - [Build] Fix missing virtual destructor in SIBuilder
#15016 - Fix type parse error about AdaptiveMaxPool
#15007 - [Minor] Fix compilation warnings
#15000 - [CMAKE] Introduce dummy build as an option
#14863 - [DataType] Initial support of fp8 (e4m3/e5m2)
#14975 - [CMAKE] Add a dummy target to defer libtvm dep
#14574 - [IR][SIBuilder]
#14939 - [Target] Add target to all TVM callbacks
#14937 - [BUILD] Enable log before throw message in windows
#14934 - [TestCases] fix unreachable test cases due to outside the for-loop
#14916 - [TypoFix] fix some typo problem in keras frontend
#14893 - [Contrib] Use f-strings for string formatting, NFC
#14884 - [AutoTVM] Use f-strings for string formatting, NFC
#14876 - [CONTRIB] Enable create_staticlib to take in tar files
#14867 - Fix f-string typo
#14851 - Add v0.12.0 docs
#14813 - [BUILD] Removed the duplicated MACROs in config.cmake
#14743 - [SUPPORT] Fix RingBuffer ReadWithCallback
#14799 - [LINT] Fix clang-format script for newest clang-format
#14797 - [NDArray] Allow arbitrary stride when the corresponding shape is 1
#14790 - More clear ref of thirdparty license
#14779 - fix: use arm on demand instead of spot
#14762 - [Target][Minor] Add A6000 Target Tag
#14683 - [AutoTVM] Added Droplet algorithm in TVM
#14694 - unify search path approach to various libs
#14686 - [CMAKE] Update search pattern of config
#14636 - Fix bug about wrong attribute name
#14628 - [CODEGEN] Fix metal codegen when with only single working dim
#14607 - fix: deploy ci
#14569 - [Node] Allow alternative root names in ObjectPath::Root()
#14522 - [Object] Implemented .as for ObjectRef param, returns Optional
#14477 - feat: use spot instances for ci with on demand as a backup
#14468 - [AutoTVM] New rank-binary loss_type for the new xgboost >= 2.0.0 behaviour
#14544 - Update to v0.13.dev0
#14539 - [Target] Add Apple M1 GPU tag with 256-thread restriction

apache/tvm v0.13.0.rc0 Apache TVM v0.13.0 on GitHub