Introduction
The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC;
- Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
- Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
- Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
- microTVM, AOT, TVMC, LLVM;
- CI, BugFix, Docs, Docker, Miscs;
Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.
Community
- #15086 - Aleksei-grovety -> Reviewer
- #14676 - Jiajun Jiang -> Reviewer
- #14677 - Qiang Zhang -> Reviewer
- #14622 - Sunghyun Park -> Reviewer
- #14578 - Zihao Ye -> Committer
- #14853 - Anirudh Sundar Subramaniam -> Committer
- #14772 - Add new key for release signing
RFC
Frontend
- #14830 - Use f-strings for string formatting, NFC
- Keras
- #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
- #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
- #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
- #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
- #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
- #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
- #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
- Paddle
- #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
- #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
- #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
- #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
- TFLite
- TensorFlow
- #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
- PyTorch
- ONNX
- #15017 - [ONNX] Fix bug in scatter_elements
Runtime
- #15182 - Add weak symbol to builtin fp16
- #15161 - Clean TVM stacktrace in error messages
- #15162 - Support void as dtype in FFI
- #14902 - Update Module and Registry to use String Container
- #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
- #14887 - Make systemlib unique per prefix
- #14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
- #14656 - Fix Can't "query_imports" Bug of VM Executable
Adreno
CMSIS-NN
- #15059 - Update CMSIS-NN release to v4.1.0
OpenCL & CLML
- #14972 - [OPENCL] Always use convert_T for type conversion
- #14995 - [OpenCL] Improve diagnostic message
- #14833 - [Codegen][OpenCL] fix amibiguous selection operator call
- #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
- #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
- #14949 - [CodegenC] Updated unit test for sorted CodegenC output
- #14767 - [OpenCLML] Transposed convolution support and other fixes
cuda & cutlass & tensorrt
- #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
- #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
- #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM
metal
- #14962 - Fix int8 vectorized cast
- #14846 - Fix vectorized select
- #14727 - Update metal runtime to directly store kernel map
- #14671 - Fix flaky memory issue due to racing
Vulkan
Hexagon
- #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
- #14948 - Update instructions to compile hexagon runtime
- #14965 - Add support for v73, make v68 default
- #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
- #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit
ROCm
- #15106 - [TensorIR]AMD Matrix Core Support
- #15088 - [Target]Replace rocm arch parsing from int to string
microTVM
- #14872 - Use self.close_transport() on error
AOT
- #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
- #15032 - Remove duplication in tvm.testing.aot.compile_models
- #14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName
micoNPU
- #15159 - [microNPU][ETHOSU] Fix compiler attributes types
- #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
- #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
- #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
- #15114 - [microNPU] Upgrade Vela to v3.8.0
- #15104 - [microNPU][ETHOSU] Fix minimum buffer size
- #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
- #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
- #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
- #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
- #14629 - [microNPU][ETHOSU] Softmax int8 legalization support
- #14353 - [microNPU] Add support for MEAN with uint8 ifm
- #14587 - [microNPU] Fix skip tests when Vela is not present
- #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass
BYOC
Relay
- #15068 - Improve the "clip" op optimization in simplify expr pass
- #14925 - add a dimension check to reject invalid input
- #14858 - [simplify_expr]: Add pass to remove trivial transpose ops
- #14838 - Use f-strings for string formatting, NFC
- #14831 - [Relay/Op] Use f-strings for string formatting, NFC
- #14580 - Simplify the square of a binomial
- #14735 - Handle pad value coming from Tensor instead of scalar
- #14601 - Enhance type infer for dynamic shape
- #14885 - [Relay] fix broadcast in PyTorch frontend
- #15090 - [Relay] Insertion of "device_copy" CallNode to Resolve Device Conflict on Unconstrained Nodes
- #14845 - [Relay] Fix softplus in paddlepaddle frontend
- #14837 - [Relay] Fix AdaptiveAvgPool2d about wrong dtype prasing
- #14821 - [Relay] Fix softplus about the wrong calculation formula in Relay PyTorch frontend
- #14820 - [Relay] Fix threshold calculation logic in PyTorch frontend
- #14824 - [Relay] fix a bug about ReLu in the threshold attribute which causes a different results with keras
- #14796 - [relay] fix wrong calculate logic about celu
- #14773 - [Relay] fix
scatter_nd
type relation - #14742 - [relay] Fix alpha attribute with None in ELU
- #14740 - [Relay] Fix stride in LpPool for default
- #14556 - [Relay] fix a bug caused by IncompleteTypeNode in EinsumRel while doing MergeComposite
- #15057 - [QNN] Implement quantized avg_pool2d
- #14536 - [QNN] Implement 'qnn.softmax'
- #14875 - [Quantization]: Update simulated_quantize to infer correct layout
TOPI
- #15018 - Fix dynamic dimensions support for Dense on TOPI side
- #14856 - Fix in interpretation of empty axis parameter in reduction fun…
- #14483 - [Target] Add SVE specific convolution
- #14839 - Use f-strings for string formatting, NFC
- #14822 - Use f-strings for string formatting, NFC
- #14519 - Vectorize depthwise conv2d output operator
- #14549 - remove the i32 cast for output shape of pool
- #14566 - [Topi] Output strides in pack_buffer() utility
Arith
- #15131 - Hotfix flaky test in padded matmul
- #15120 - NormalizeToIterSum
- #15081 - Improve arith simplify to handle symbolic reshape pattern
- #14532 - Implement statistics counters for RewriteSimplifier
- #14704 - [cherry-pick][BUGFIX] Fix a bug of iter map floormod(x,2) simplify
- #14849 - [TVMScript] Capture fails if var appears only in annotation
- #14596 - [TensorIR] Improve CompactBufferRegion for symbolic shape
- #15129 - [TIR] Recognize empty extents
- #14982 - [TIR][VTA] Update host-side target, even without device func
- #14547 - Enhance IterMapSimplify for symbolic
- #14571 - [BUGFIX] Fix a bug of iter map floormod(x,2) simplify
- #14582 - Fix solve inequality of unbound var ranges
- #14538 - Enhance CanonicalSimplify to Simplify ProdDiv
MetaSchedule
- #14781 - [MetaSchedule] RPC port needs to be an integer
- #14673 - Introduce MMA Tensor Core Multilevel Tiling
- #14784 - Enhance
tune_tir
to tune IRModule of TIR Collections - #14783 - Add an API to dump a pruned database
- #14785 - Clear screen only when specified
- #14654 - Handle output cases for InlineConstantScalars
- #14642 - PostProc not rewriting unroll for purely spatial block
- #14591 - Handle cases when no features found by FeatureExtractor
- #14584 - [ARM] Beautification of the function names
TIR
- #15153 - [TensorIR][Visitor] Visit buffer members in
match_buffer
's in block visitor functions - #15168 - [Schedule] Support padding-by-factor in PadEinsum
- #15165 - Expose UndefinedVars to Python
- #15163 - Fix RenewDef for symbolic input shapes
- #15142 - [Schedule] Enhance
compute-inline
for fusion - #15150 - Fix typo in code example
- #15144 - [TensorIR][Schedule] New schedule primitive
unsafe_hide_buffer_access
- #15146 - Block dependence analysis without schedules
- #15119 - Avoid duplicate GlobalVar names in SplitHostDevice
- #15037 - Handle DeclBuffer in CacheReadWrite schedule primitive
- #15098 - [Ethos-U]Handle DeclBuffer in Ethos-U inputs
- #15044 - [USMP] Preserve DeclBuffer in PoolAllocationToOffsetConverter
- #15078 - Handle DeclBuffer in LowerThreadAllreduce
- #15094 - Handle DeclBuffer in MergeDynamicSharedMemoryAllocations
- #15093 - Handle DeclBuffer in StorageAccessInfoLower
- #15045 - Handle DeclBuffer in InjectDoubleBuffer
- #15096 - Handle DeclBuffer in RemoveNoOp
- #15076 - [CodeGen] Define PackedFunc error code in MakePackedAPI
- #15102 - Update primfunc host attachment to include host
- #14854 - [Compute-at] Enable complex floordiv/floormod expressions in compute_at
- #15041 - Handle DeclBuffer in LowerCustomDatatypes
- #15038 - Handle DeclBuffer in Inline/ComputeAt/ReverseComputeAt
- #15052 - [Analysis] Handle DeclBuffer in FlopEstimator
- #15051 - Handle DeclBuffer in StorageRewrite
- #15050 - [Schedule] Fix decompose_padding bug with dtypes
- #15034 - Refactor BlockScope outside schedule
- #15054 - Handle DeclBuffer in IRSubstitute
- #14986 - Move SplitHostDevice to before MakePackedAPI
- #15042 - Handle DeclBuffer in StorageFlatten's input
- #15040 - Preserve object equality in Buffer::GetFlattenedBuffer
- #14693 - Enhance TVMScript Buffer Slice Access
- #14988 - Handle callees on same target, different codegen
- #14951 - Keep trivial LetStmt in tir.Simplify when used in buffer decl
- #14944 - Restrict tir.transform.LowerTVMBuiltin to host functions
- #14990 - [IR,TE,TIR] Use f-strings for string formatting, NFC
- #14993 - Fix incorrect construction of block frames
- #14952 - Avoid re-defining
var = arg_var
in ArgBinder - #14918 - SplitHostDevice, handle subroutines
- #14943 - Restrict tir.transform.InstallDebugSpans to host functions
- #14942 - Preserve existing kTarget function attribute in BindTarget
- #14945 - Restrict tir.transform.CombineContextCall to host functions
- #14914 - Handle subroutine calls in MakeUnpackedAPI
- #14913 - Handle subroutine calls in MakePackedAPI
- #14892 - Expand unit tests for ConvertSSA
- #14866 - Avoid too complex predicate in compaction
- #14766 - [Schedule] Improve blockize to support blockizing multiple blocks
- #14776 - Improved parameter name in DLTensor unpacking error messages
- #14562 - [Driver] Move ShouldAnnotateEntryFunc logic into transform
- #14741 - Keep block annotations from tensorization
- #14021 - More flexible buffer compaction
- #14711 - [Analysis] Calculate allocated memory at module level
- #14492 - Flatten SeqStmt on construction
- #14598 - Add CUDA int4 tensor core intrinsics
- #14593 - [Schedule] Method returning the function being worked on
- #14592 - [TensorIR] Fix ComputeAt with perfect symbolic bound
- #14491 - Use String instead of StringImm for AttrStmtNode::node
- #14626 - [TensorIR]
reindex_cache_write
do not mutate init statement - #14588 - [Fix][TIR] UnifyThreadBinding creating unit loop with annotation
- #14589 - [Fix][TIR][Analysis] Reduction block checking alloc_buffers
TVMScript
- #15083 - Avoid visiting repetition tensor in SetCommonPrefix Visitor
- #15091 - [TIR]Convert tir.op operands to PrimExpr
- #14919 - [TIR] Parse subroutine calls with no arguments
- #14941 - Prevent bool to int conversion in T.Assert condition
- #14915 - Allow T.target("device", host="host") to specify host
- #14900 - Round-trip DeclBuffer with undefined data pointer
- #14889 - [TIR]Added format/parsing of subroutine calls
- #14874 - Use default fallback for un-registered type
- #14840 - Print Executor, Runtime, and FunctionInfo as metadata
- #14812 - Handle AllocatedPoolInfo, ConstantPoolInfo, ConstantInfo
- #14786 - Add
__name__
attr for parsed PrimFunc and IRModule - #14531 - Preserve LetStmt of constants
- #14488 - Distinguish between void* and handle
TVMC
- #14994 - [Bugfix]Fix tvmc option for printing which operators are offloaded to the Ethos-U
LLVM
- #15127 - Remove the "ret_void" argument of AddFunction
- #15139 - Minor refactor to LLVMModuleNode::SaveToFile
- #14958 - [Codegen]Allow void return type from PackedFunc
- #14946 - Expose Host CPU Feature Detection
- #14901 - Codegen subroutine call when CallNode::op is GlobalVar
- #14570 - Use Var annotation in LetStmt for pointer type
- #14843 - [RUNTIME] Enable multi systemlib with device code
- #14564 - Validate generated LLVM module before optimization
- #14568 - Expand tvm::Type to DWARF conversion
- #14563 - [Codegen]Remove cast to i8* in builtin::address_of
BugFix
- #14960 - [Bug] Add typing_extensions requirement again
- #15015 - [Hotfix] Remove
LOG(INFO)
from unsupported dtype legalization pass - #14991 - Make ThreadAllReduce pass compatible with int64
- #14950 - Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI
- #14903 - [Test Cases]Add some version check to make test cases run in all PyTorch versions
- #14890 - [Fix] Fix typo in error message
- #14879 - fix the undeclared identifier 'f'
- #14857 - Fix batch_norm
- #14787 - [FIX] fix typo in comment
CI
- #15179 - [Testing] Utility method to run TVM on remote device
- #15138 - [Test] Improve check for TVMError exception in test_cast
- #15062 - Clone submodule recursively
- #15065 - Revert "Make Graviton3 default AArch64 job runner node (#14983)"
- #14983 - Make Graviton3 default AArch64 job runner node
- #15056 - [Bugfix]Fix CacheControl version constraint violation
- #14908 - Update the expected CI jobs list in the update_branch script
- #14847 - Update CPU image to install PyTorch
- #14808 - [Testing] Use TVMScript's "name" argument for error messages
- #14780 - fix doc deploy issue
- #14651 - Modify test cases to accommodate the CI upgrades
- #14666 - sccache support while using ci.py under multi user environments
- #14635 - Upgrade CI
- #14713 - Add PLATFORM env var to builds
- #14680 - Downgrade ci_cpu llvm version back to 11
- #14653 - [tests][scripts][release] Optimize release note script about categories etc
- #14646 - [test][script] Fix release gather_pr.py of script about ghost users or blank PR nodes
- #14550 - Add JAX deps in Dockerfiles
- #14466 - Update ci_cpu image and build with llvm-15
Docker
- #15149 - Fix build.sh environment variables
- #15105 - Update docker images for llvm-16
- #15092 - Update ci-cortexm docker image to contain CMSIS-NN release v…
- #15095 - Add build.sh environment variables
- #15067 - Migrate arm docker image to use llvm packages
- #15031 - Update ci_cpu docker image to one containing polly package f…
- #15003 - [ADRENO] Docker setup changes for multi user environments
- #14912 - Add polly package
- #14842 - Install PyTorch on cpu image
- #14590 - Support rootless docker when using docker/bash.sh
Docs
- #15126 - [DOC] Add RPC System Setup Document
- #15071 - Updated the copyright year from 2020 to 2023
- #15055 - [DOC][TUTORIAL] Fix typo for the 'Making your Hardware Accelerator TVM-ready with UMA'
- #14504 - [TensorIR][Doc] Docstring of
reorder_block_iter_var
- #14611 - [TIR] Fix unsafe_set_dtype docstring
- #14585 - Fix typo in the Vitis AI Integration docs
Misc
- #15267 - [release] Disable git merge to avoid conflict
- #15187 - [RPC] Report RPC Session Timeout to Client Instead of "kShutdown"
- #15185 - Update tvm_runtime.h
- #15164 - [CMake] Support LLVM-16 static linking
- #15167 - [Python] Enhance Wheel Packaging
- #15166 - [Target] Add MetaSchedule-compatible attributes to OpenCL
- #15154 - [Minor] Fix Compilation Warnings
- #15132 - [NDArray] Allow creating a view from a strided array
- #15116 - [RPC] Add Missing Option "port_end" to RPC Proxy
- #15073 - [CodeGenC] Use PrimFuncNode::ret_type in function signature
- #15036 - [StackVM] Updated CodeGenStackVM to handle DeclBuffer
- #15022 - [Build] Fix missing virtual destructor in SIBuilder
- #15016 - Fix type parse error about AdaptiveMaxPool
- #15007 - [Minor] Fix compilation warnings
- #15000 - [CMAKE] Introduce dummy build as an option
- #14863 - [DataType] Initial support of fp8 (e4m3/e5m2)
- #14975 - [CMAKE] Add a dummy target to defer libtvm dep
- #14574 - [IR][SIBuilder]
- #14939 - [Target] Add target to all TVM callbacks
- #14937 - [BUILD] Enable log before throw message in windows
- #14934 - [TestCases] fix unreachable test cases due to outside the for-loop
- #14916 - [TypoFix] fix some typo problem in keras frontend
- #14893 - [Contrib] Use f-strings for string formatting, NFC
- #14884 - [AutoTVM] Use f-strings for string formatting, NFC
- #14876 - [CONTRIB] Enable create_staticlib to take in tar files
- #14867 - Fix f-string typo
- #14851 - Add v0.12.0 docs
- #14813 - [BUILD] Removed the duplicated MACROs in config.cmake
- #14743 - [SUPPORT] Fix RingBuffer ReadWithCallback
- #14799 - [LINT] Fix clang-format script for newest clang-format
- #14797 - [NDArray] Allow arbitrary stride when the corresponding shape is 1
- #14790 - More clear ref of thirdparty license
- #14779 - fix: use arm on demand instead of spot
- #14762 - [Target][Minor] Add A6000 Target Tag
- #14683 - [AutoTVM] Added Droplet algorithm in TVM
- #14694 - unify search path approach to various libs
- #14686 - [CMAKE] Update search pattern of config
- #14636 - Fix bug about wrong attribute name
- #14628 - [CODEGEN] Fix metal codegen when with only single working dim
- #14607 - fix: deploy ci
- #14569 - [Node] Allow alternative root names in ObjectPath::Root()
- #14522 - [Object] Implemented .as for ObjectRef param, returns Optional
- #14477 - feat: use spot instances for ci with on demand as a backup
- #14468 - [AutoTVM] New rank-binary loss_type for the new xgboost >= 2.0.0 behaviour
- #14544 - Update to v0.13.dev0
- #14539 - [Target] Add Apple M1 GPU tag with 256-thread restriction