Introduction

The TVM community has worked since the v0.13.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

Community, RFC
Arith, MetaSchedule
Adreno, ArmComputeLibrary, Hexagon, Metal, OpenCL & CLML, ROCm, Vulkan, cuda & cutlass & tensorrt, micoNPU, web
Runtime, TVMC, AOT, LLVM, microTVM, CMSIS-NN
Frontend, Relay, BYOC
TOPI, TIR, TVMScript
Docs, CI, Docker
Misc, , BugFix

Please visit the full listing of commits for a complete view: v0.13.0...v0.14.0.

Community

#15307 - Qingchao Shen -> Reviewer
#15619 - community strategy decision process

RFC

#102 - [Process RFC] Clarify Community Strategy Decision Process

AOT

#15301 - Avoid call_extern() with incorrect argument count
#15181 - Remove workaround to help resolve test flakiness

Adreno

#15830 - Minor changes for Adreno docs and help scripts
#15671 - [VM]Fix using buffers for weights in VM
#15391 - Small fixes in Adreno schedules

Arith

#15881 - Simplify the result of non-divisible floordiv
#15665 - Fix detect non-divisible iteration form like (x % 255) // 16
#15638 - MLIR PresburgerSet compile fix mlir >= 160
#15628 - Added simplification rule for multiple equality compares
#15558 - Fix detect linear equation with uint var
#14690 - Add tvm::arith::PresburgerSetNode to work with Presburger Set in MLIR
#15555 - Fix handling of overlapping predicates
#15471 - Enhance Canonical Simplify for LE
#15228 - Enhance buffer shape bound deduction to include offset

ArmComputeLibrary

#15600 - [ACL] Update Compute Library to v23.05.1
#15344 - [ACL] Update Compute Library to v23.05

BugFix

#15891 - [Relay]fix axis parsing of repeat converter in the MXNet frontend
#15873 - [Fix] Remove duplicated words from comments, NFC
#15868 - [Relay]Fix conv transpose with default strides in ONNX frontend
#15773 - [CPP] Fix cpp deploy bug
#15778 - [Hotfix] Fix Windows Pipe
#15748 - Move symbols that are relevant to the runtime from libtvm to…
#15752 - [Relay]fix the wrong calculate logic of operator flip in PyTorch frontend
#15715 - [Relay]Fix the wrong implementation about operator Threshold in oneflow
#15711 - [Strategy] Fix arm_cpu int8 conv2d strategy for dotprod and i8mm targets
#15717 - [Relay]fix the wrong implementation of Softplus in OneFlow
#15677 - [Arith] IterMapRewriter abort rewriting once failure
#15629 - [VTA] tvm.tir.Call has no name attribute
#15584 - [Relay][Strategy] Enable compile time transformation of weights matrix for arm_cpu NHWC quantized conv2d
#15542 - [Fix] Fix the typo in compile flag
#15484 - [TOPI] Fix a bug in arm_cpu int8 conv2d i8mm schedule
#15473 - [Relay] Fix some bugs of dominator pattern
#15478 - [TIR] ThreadSync with shared.dyn awareness
#15406 - [TIR]Ensure the Var's scope is correct
#15399 - [TIR] Fix multi-grouped multi-warp allreduce
#15350 - [Relay] fix a bug of printing dataflow pattern
#15385 - Work around "Internal Compiler Error" in MSVC
#15294 - [Bug][Relay] fix relay frontend pytorch op addmm bug
#15323 - [Fix][TIR] LowerThreadAllreduce with correct thread mask
#15291 - [Relay][GraphExecutor] Fix set_input_zero_copy() precision bug
#15225 - Fix function to read all file

CI

#15903 - [Target]Add LLVM functions for current system info
#15897 - [ADRENO] Few updates to Adreno docker setup
#15836 - Update ci-gpu image
#15668 - Allow Limit CPUs in Docker
#15568 - [Testing] Allow Capitalized name in CompareBeforeAfter
#15519 - [TEST] Run tests/python/relay/aot tests in ci-cortexm
#15485 - Remove cython version pin
#15421 - Bump Flax and Jaxlib versions to fix Jaxlib install error
#15226 - Add ml_dypes dependency for all docker images
#15353 - Pin cython version to fix cython compilation
#15352 - Make Graviton3 default AArch64 job runner node
#15339 - Update test to include unique attribute
#15277 - [Testing] Return BenchmarkResult in local_run and rpc_run
#15268 - [Testing] Add tvm.testing.local_run
#15136 - [UnitTest][NVPTX] Avoid cascading failures from CUDA postproc

CMSIS-NN

#15747 - Move CMSIS_5 from SHA to release based upgrade
#15407 - Support for Softmax Int16 operator

Docker

#15799 - Add LLVM 17 to the LLVM install script
#15862 - Upgrade oneflow to v0.8.0
#15819 - Install oneflow from PyPi
#15310 - Update ci-cortexm docker image
#15293 - tensorflow_aarch64 package upgrade

Docs

#15619 - community strategy decision process
#15508 - Add v0.13.0 docs to site
#15213 - [#15157][Rust][Doc] Re-enable the Rust documentation build

Frontend

#15821 - [TFLite]Support quantized ELU
#15844 - [TFLite]Fix test failures caused by div-by-zero
#15798 - [TFLite]Support quantized Pow
#15829 - [Relay][Keras][Bugfix] fix the converters of GRU and SimpleRNN about the go_backwards attribute
#15838 - Fix unnecessary pylint errors
#15802 - [SkipCI][Hotfix][TFLite] Disable test of quantized floor mod
#15790 - [TFLite]Support quantized LESS_EQUAL
#15775 - [TFLite]Support quantized GREATER_EQUAL
#15769 - [TFLite]Support quantized NOT_EQUAL
#15768 - [TFLite]Support quantized div
#15746 - [TFLite]Support quantized LESS
#15733 - [TFLite]Support quantized floor_mod
#15724 - [TFLite]Support quantized floor_div
#15602 - [ONNX][BugFix] Support If body with free variable from graph input
#15472 - [Relay][TFLite] Fix in qnn.conv2d when parameter groups not equal to 1
#15117 - [TFLITE] Add support for TFLite's regular NMS operator
#15415 - [ONNX] add onnx Mish operator
#15422 - [Keras] Add support for swish actiivation
#15370 - [Relay][Pytorch] Add aten::view_as
#15335 - [Bugfix][Keras] Add a check to reject the invalid input shape
#15334 - [Bugfix][Relay][Keras] Add a assertion to reject a invalid value for attribute units in RNN layers
#15337 - [Bugfix][Keras]Fix a corner case bug in softmax converter of keras frontend
#15259 - [TFLITE][BugFix] Fix variable typo in batchmatmul converting func
#15261 - [bugfix][keras] Fix go_backwards attribute of LSTM in keras frontend

Hexagon

#15788 - Properly handle RPC server shutdown
#15599 - F2qi avgpool bug fix
#15414 - Add default vtcm capacity for targets
#15367 - Simplify Mul->Sub->Conv to Conv->Add when possible
#15258 - Propagate QNN Concat Quantization Params to Inputs

LLVM

#15921 - Fix for llvm CodeGenOpt API change

MetaSchedule

#15792 - Allow generating uint random data
#15574 - Fix metaschedule flop estimation for non-integer loop dimensions
#15532 - Enable subprocess to stdout for DEBUG level
#15437 - Fix mma default rule and disable tuning abort
#15133 - [XGBoost,MetaSchedule] Support xgb set tree method

Metal

#15756 - [Unittest]Add minimal metal functionality test to CI
#15749 - [UnitTest]Parametrize allreduce GPU tests
#15401 - [Codegen]Support metal warp-level primitive

OpenCL & CLML

#15745 - [OpenCL] Don't initialize OpenCL runtime on host
#15400 - [VM][OpenCL] Introduce textures allocation to VM memory manager

ROCm

#15777 - [Codegen]Mismatched Dtype of Workgroup/Workitem
#15464 - fma intrin
#15454 - Fix some ROCm codegen bugs

Relay

#15889 - fix the conflicted documentation description
#15648 - [TOPI] Remove input padding for arm_cpu conv2d int8 native schedule in Legalize pass
#15386 - Fix an adaptive_max_pool1d operator conversion bug
#15533 - Disable exception for ADT in mixed precision pass
#15506 - [Strategy] Use x86 pool schedules for arm_cpu
#15470 - [Strategy] Use x86 dense schedules for arm_cpu
#15392 - add redirecting operation to dataflow pattern graph
#15468 - [Strategy] Fix arm_cpu int8 conv2d schedule selection for 32-bit targets
#15461 - Stop ToMixedPrecision when constant is out of dtype range
#15362 - improve SimplifyClipAndConsecutiveCast pass
#15137 - Introduce arguments limit to FuseOps pass
#15211 - Fix bug in MergeCompilerRegions pass
#15237 - ExprMutator Return Origin Expr When All Fields Isn't Changed
#15235 - [QNN] Support Dequantize to "float16" and Quantize to "uint16"

Runtime

#15693 - Make CSourceModule and StaticLibraryModule Binary Serializable
#15658 - Make export_library parameters after file_name keyword-only
#15637 - [Backport]Fix ICE from Clang
#15244 - Serialization/Deserialization of runtime module
#15630 - Utils to Stringify Device
#15623 - Expose ModuleGetFunction as PackedFunc
#15595 - Enhance PackedFunc Metaprogramming with PackArgs
#15543 - [Minor] Suppress verbose logging in Metal device API
#15305 - Flush L2 cache in time eval
#15332 - Device API to query L2 cache size

TIR

#15913 - Fix offset_factor in cuda tensor core intrins
#15906 - Fix the error example in the documentation for pad_einsum
#15816 - Revert "[TensorIR][Visitor] Visit buffer members in match_buffer's in block visitor functions (#15153)
#15763 - Do not drop 4th argument to tir.max
#15646 - Output DeclBuffer in LowerThreadAllreduce
#15493 - Output DeclBuffer in SplitHostDevice
#15517 - Shuffle in PointerValueTypeRewrite for scalar reads
#15263 - Output DeclBuffer in MakePackedAPI
#15465 - [TIR, Schedule] Fix decompose reduction with thread binding loops
#15432 - Generalize implementation of T.macro to work with other dialects
#15413 - Fix Primitive Rfactor DType
#15404 - Allow starred expressions in TIR script
#15374 - Finer predicate handling in cross-thread reduction
#15373 - Allreduce broadcast result to each thread in multi-warp case
#15214 - [UX] Implement privacy annotations in TIR
#15241 - Return error code from kernels in SplitHostDevice
#15327 - ThreadAllreduce warp-level primitive support with multi-warp
#15260 - Implement TIR macros
#15253 - Call TVMBackendFreeWorkspace inside LetStmt
#15264 - Allow symbolic bounds in IndexMap analysis
#15243 - Output DeclBuffer in LowerTVMBuiltin
#15236 - [Schedule] Scoped CacheRead/Write producing compact region
#15242 - Preserve AllocateNode::annotations
#15247 - Allow VerifyWellFormed to accept IRModule
#15192 - Support cross-threaad reduction lowering with thread-broadcasting rewrite
#15210 - [Schedule] Derive Nonnegative Bounds from Shape Var
#15207 - [Transform] Add LiftThreadBinding Pass

TOPI

#15685 - [Target]Use LLVM for x86 CPU feature lookup
#15710 - Ensure vectorization of input padding in arm_cpu int8 conv2d interleaved schedule
#15513 - check empty array of x86 injective's iters
#15371 - Revert "Add arm_cpu specific pooling schedules"
#15311 - Add arm_cpu specific pooling schedules
#15286 - Revert "Add arm_cpu specific pooling schedules"
#14855 - Add arm_cpu specific pooling schedules

TVMC

#15779 - enable dumping imported modules too
#15349 - Add tvmc flag to print compilation time per pass

TVMScript

#15824 - Preserve traceback across TVMScript parsing
#15762 - Use environment variable TVM_BLACK_FORMAT for .show()
#15706 - Disable black_format by default
#15705 - [FIX] Disable show_object_address in printing by default
#15579 - Optionally output the address as part of variable names
#15564 - Use triple-quoted python strings for metadata
#15547 - Create loop var with min_val dtype in for frame
#15492 - Allow use of Python builtins in script
#15442 - Support starred indices in for-loop
#15249 - Ensure completed root block has no read/write
#15239 - Handle parsing of PrimFunc calls with non-void return

cuda & cutlass & tensorrt

#15573 - [CUTLASS][Cherry-pick] Introduce several features of cutlass profiler
#15480 - [Bugfix][CUTLASS] CUTLASS path finding

micoNPU

#15780 - [microNPU][ETHOSU] MatMul legalization support
#15428 - [microNPU][ETHOSU] Fix concatenation with reused buffers
#14909 - [ETHOSU][MicroNPU][Pass] Add a pass to replicate pads
#15186 - [microNPU][ETHOSU] Add Vela's logic to select configuration block

microTVM

#15667 - Check the output of microNPU demos in CI

web

#15218 - Increase default EMCC compilation total memory size

Misc

#15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
#15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
#15847 - [release] Update version to 0.14.0 and 0.15.0.dev on main branch
#15867 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/ethosu
#15866 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/cmsisnn
#15865 - Bump pillow from 9.2.0 to 10.0.1 in /apps/microtvm
#15833 - [VM] Memory Manager moved up to runtime
#15859 - [Script] Fix miscs of make_notes.py
#15818 - [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
#15761 - [Target] LLVM helper functions for any target info
#15672 - [IR] Implemented Variant<...> container
#15714 - [Target][Device] Auto detect target and create device from str in torch style
#15723 - fix _convert_simple_rnn
#15725 - Revert "[CodeGenC] Handle GlobalVar callee as internal function call"
#15684 - [Hopper TMA] Add intrinsic to create barriers for synchronization
#15683 - Fix a bug caused by PyTorch instance_norm when the input shape is [1,1,1,2]
#15596 - [FFI] Propagate Python errors across FFI boundaries
#15666 - [Module] Implement custom imported modules serialization
#15656 - [Hopper TMA] Add CUDA codegen support for bulk asynchronous copy
#15664 - [IR] Use structural equal for Range equality
#15649 - Add output_data_sec section in corstone300.ld
#15639 - Do not link LLVM libraries into cpptest binary
#15631 - [RPC] Enhance RPC Protocol to support TVM Object
#15624 - [CMake] Add RCCL to TVM and TVM Runtime
#15616 - [Hopper TMA] CUDA codegen for async copy with barrier synchronization
#15537 - [CPP_RPC] export listdir for RPC
#15605 - [CMake] Add NCCL to TVM and TVM Runtime
#15580 - Fix "to" duplicate word in python and C header file
#15581 - Remove duplicate load word inside .cc file
#15582 - Remove duplicate 'from' word inside python script
#15554 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm
#15552 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/ethosu
#15553 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/cmsisnn
#15536 - fixed typo [TypoFix]
#15529 - [quantize] fix bug of annotate for output of add op
#15535 - Fixed search task comment
#15530 - Remove duplicate msg word and condition inside the function doc
#15511 - Remove IRModule Dependency from Target
#15525 - Fix typo mistake and change whethe to whether
#15524 - Remove duplicate the word
#15103 - [CodeGenC] Handle GlobalVar callee as internal function call
#15419 - [VM][Textures] Enable OpenCL textures for VM
#15483 - [Script] Be more careful when generating ast.ExtSlice for Subscript
#15469 - [CYTHON] Make cython compatible with 3.0
#15423 - [Submodule] Add Flash attention v2
#15380 - [Target] Add Jetson Orin Nano tag
#15359 - [CMAKE] Conditionally link "clog" in NNPack install
#15326 - [OP] Add rms_norm into TOPI
#15312 - [skipci] Fix typo in docs/arch/index.rst
#15298 - [Release] Extend PR tags and Format PR hyper-links in release report
#15328 - [Package] Remove cutlass media/docs inside cutlass_fpA_intB_gemm
#15321 - [JVM] Fix the Maven pom.xml for OS X arm64 tvm4j build
#15265 - Fix keras version problem
#15292 - [RPC] Fix socket bind errno on corner case
#15287 - [Exec] Add a script to test GPU memory bandwidth
#15234 - [Miscs] Enhance script about make release notes
#15229 - [CMAKE] Add Vulkan header for Android
#15215 - [Android] ndk static build
#15208 - Update version to 0.14.dev0 on main branch

apache/tvm v0.14.0.rc0 Apache TVM v0.14.0 rc0 on GitHub