Introduction
The TVM community has worked since the v0.13.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC
- Arith, MetaSchedule
- Adreno, ArmComputeLibrary, Hexagon, Metal, OpenCL & CLML, ROCm, Vulkan, cuda & cutlass & tensorrt, micoNPU, web
- Runtime, TVMC, AOT, LLVM, microTVM, CMSIS-NN
- Frontend, Relay, BYOC
- TOPI, TIR, TVMScript
- Docs, CI, Docker
- Misc, , BugFix
Please visit the full listing of commits for a complete view: v0.13.0...v0.14.0.
Community
RFC
AOT
- #15301 - Avoid call_extern() with incorrect argument count
- #15181 - Remove workaround to help resolve test flakiness
Adreno
- #15830 - Minor changes for Adreno docs and help scripts
- #15671 - [VM]Fix using buffers for weights in VM
- #15391 - Small fixes in Adreno schedules
Arith
- #15881 - Simplify the result of non-divisible floordiv
- #15665 - Fix detect non-divisible iteration form like (x % 255) // 16
- #15638 - MLIR PresburgerSet compile fix mlir >= 160
- #15628 - Added simplification rule for multiple equality compares
- #15558 - Fix detect linear equation with uint var
- #14690 - Add tvm::arith::PresburgerSetNode to work with Presburger Set in MLIR
- #15555 - Fix handling of overlapping predicates
- #15471 - Enhance Canonical Simplify for LE
- #15228 - Enhance buffer shape bound deduction to include offset
ArmComputeLibrary
BugFix
- #15891 - [Relay]fix axis parsing of repeat converter in the MXNet frontend
- #15873 - [Fix] Remove duplicated words from comments, NFC
- #15868 - [Relay]Fix conv transpose with default strides in ONNX frontend
- #15773 - [CPP] Fix cpp deploy bug
- #15778 - [Hotfix] Fix Windows Pipe
- #15748 - Move symbols that are relevant to the runtime from libtvm to…
- #15752 - [Relay]fix the wrong calculate logic of operator flip in PyTorch frontend
- #15715 - [Relay]Fix the wrong implementation about operator Threshold in oneflow
- #15711 - [Strategy] Fix
arm_cpu
int8 conv2d strategy for dotprod and i8mm targets - #15717 - [Relay]fix the wrong implementation of Softplus in OneFlow
- #15677 - [Arith] IterMapRewriter abort rewriting once failure
- #15629 - [VTA] tvm.tir.Call has no name attribute
- #15584 - [Relay][Strategy] Enable compile time transformation of weights matrix for arm_cpu NHWC quantized conv2d
- #15542 - [Fix] Fix the typo in compile flag
- #15484 - [TOPI] Fix a bug in arm_cpu int8 conv2d i8mm schedule
- #15473 - [Relay] Fix some bugs of dominator pattern
- #15478 - [TIR] ThreadSync with shared.dyn awareness
- #15406 - [TIR]Ensure the Var's scope is correct
- #15399 - [TIR] Fix multi-grouped multi-warp allreduce
- #15350 - [Relay] fix a bug of printing dataflow pattern
- #15385 - Work around "Internal Compiler Error" in MSVC
- #15294 - [Bug][Relay] fix relay frontend pytorch op addmm bug
- #15323 - [Fix][TIR] LowerThreadAllreduce with correct thread mask
- #15291 - [Relay][GraphExecutor] Fix set_input_zero_copy() precision bug
- #15225 - Fix function to read all file
CI
- #15903 - [Target]Add LLVM functions for current system info
- #15897 - [ADRENO] Few updates to Adreno docker setup
- #15836 - Update ci-gpu image
- #15668 - Allow Limit CPUs in Docker
- #15568 - [Testing] Allow Capitalized name in CompareBeforeAfter
- #15519 - [TEST] Run tests/python/relay/aot tests in ci-cortexm
- #15485 - Remove cython version pin
- #15421 - Bump Flax and Jaxlib versions to fix Jaxlib install error
- #15226 - Add ml_dypes dependency for all docker images
- #15353 - Pin cython version to fix cython compilation
- #15352 - Make Graviton3 default AArch64 job runner node
- #15339 - Update test to include unique attribute
- #15277 - [Testing] Return BenchmarkResult in local_run and rpc_run
- #15268 - [Testing] Add tvm.testing.local_run
- #15136 - [UnitTest][NVPTX] Avoid cascading failures from CUDA postproc
CMSIS-NN
Docker
- #15799 - Add LLVM 17 to the LLVM install script
- #15862 - Upgrade oneflow to v0.8.0
- #15819 - Install oneflow from PyPi
- #15310 - Update ci-cortexm docker image
- #15293 - tensorflow_aarch64 package upgrade
Docs
- #15619 - community strategy decision process
- #15508 - Add v0.13.0 docs to site
- #15213 - [#15157][Rust][Doc] Re-enable the Rust documentation build
Frontend
- #15821 - [TFLite]Support quantized ELU
- #15844 - [TFLite]Fix test failures caused by div-by-zero
- #15798 - [TFLite]Support quantized Pow
- #15829 - [Relay][Keras][Bugfix] fix the converters of GRU and SimpleRNN about the go_backwards attribute
- #15838 - Fix unnecessary pylint errors
- #15802 - [SkipCI][Hotfix][TFLite] Disable test of quantized floor mod
- #15790 - [TFLite]Support quantized LESS_EQUAL
- #15775 - [TFLite]Support quantized GREATER_EQUAL
- #15769 - [TFLite]Support quantized NOT_EQUAL
- #15768 - [TFLite]Support quantized div
- #15746 - [TFLite]Support quantized LESS
- #15733 - [TFLite]Support quantized floor_mod
- #15724 - [TFLite]Support quantized floor_div
- #15602 - [ONNX][BugFix] Support If body with free variable from graph input
- #15472 - [Relay][TFLite] Fix in qnn.conv2d when parameter groups not equal to 1
- #15117 - [TFLITE] Add support for TFLite's regular NMS operator
- #15415 - [ONNX] add onnx Mish operator
- #15422 - [Keras] Add support for swish actiivation
- #15370 - [Relay][Pytorch] Add aten::view_as
- #15335 - [Bugfix][Keras] Add a check to reject the invalid input shape
- #15334 - [Bugfix][Relay][Keras] Add a assertion to reject a invalid value for attribute units in RNN layers
- #15337 - [Bugfix][Keras]Fix a corner case bug in softmax converter of keras frontend
- #15259 - [TFLITE][BugFix] Fix variable typo in batchmatmul converting func
- #15261 - [bugfix][keras] Fix go_backwards attribute of LSTM in keras frontend
Hexagon
- #15788 - Properly handle RPC server shutdown
- #15599 - F2qi avgpool bug fix
- #15414 - Add default vtcm capacity for targets
- #15367 - Simplify Mul->Sub->Conv to Conv->Add when possible
- #15258 - Propagate QNN Concat Quantization Params to Inputs
LLVM
- #15921 - Fix for llvm CodeGenOpt API change
MetaSchedule
- #15792 - Allow generating uint random data
- #15574 - Fix metaschedule flop estimation for non-integer loop dimensions
- #15532 - Enable subprocess to stdout for DEBUG level
- #15437 - Fix mma default rule and disable tuning abort
- #15133 - [XGBoost,MetaSchedule] Support xgb set tree method
Metal
- #15756 - [Unittest]Add minimal metal functionality test to CI
- #15749 - [UnitTest]Parametrize allreduce GPU tests
- #15401 - [Codegen]Support metal warp-level primitive
OpenCL & CLML
- #15745 - [OpenCL] Don't initialize OpenCL runtime on host
- #15400 - [VM][OpenCL] Introduce textures allocation to VM memory manager
ROCm
- #15777 - [Codegen]Mismatched Dtype of Workgroup/Workitem
- #15464 - fma intrin
- #15454 - Fix some ROCm codegen bugs
Relay
- #15889 - fix the conflicted documentation description
- #15648 - [TOPI] Remove input padding for arm_cpu conv2d int8 native schedule in Legalize pass
- #15386 - Fix an adaptive_max_pool1d operator conversion bug
- #15533 - Disable exception for ADT in mixed precision pass
- #15506 - [Strategy] Use x86 pool schedules for arm_cpu
- #15470 - [Strategy] Use x86 dense schedules for arm_cpu
- #15392 - add redirecting operation to dataflow pattern graph
- #15468 - [Strategy] Fix
arm_cpu
int8 conv2d schedule selection for 32-bit targets - #15461 - Stop ToMixedPrecision when constant is out of dtype range
- #15362 - improve SimplifyClipAndConsecutiveCast pass
- #15137 - Introduce arguments limit to FuseOps pass
- #15211 - Fix bug in MergeCompilerRegions pass
- #15237 - ExprMutator Return Origin Expr When All Fields Isn't Changed
- #15235 - [QNN] Support Dequantize to "float16" and Quantize to "uint16"
Runtime
- #15693 - Make
CSourceModule
andStaticLibraryModule
Binary Serializable - #15658 - Make
export_library
parameters afterfile_name
keyword-only - #15637 - [Backport]Fix ICE from Clang
- #15244 - Serialization/Deserialization of runtime module
- #15630 - Utils to Stringify Device
- #15623 - Expose ModuleGetFunction as PackedFunc
- #15595 - Enhance PackedFunc Metaprogramming with
PackArgs
- #15543 - [Minor] Suppress verbose logging in Metal device API
- #15305 - Flush L2 cache in time eval
- #15332 - Device API to query L2 cache size
TIR
- #15913 - Fix offset_factor in cuda tensor core intrins
- #15906 - Fix the error example in the documentation for pad_einsum
- #15816 - Revert "[TensorIR][Visitor] Visit buffer members in
match_buffer
's in block visitor functions (#15153) - #15763 - Do not drop 4th argument to tir.max
- #15646 - Output DeclBuffer in LowerThreadAllreduce
- #15493 - Output DeclBuffer in SplitHostDevice
- #15517 - Shuffle in PointerValueTypeRewrite for scalar reads
- #15263 - Output DeclBuffer in MakePackedAPI
- #15465 - [TIR, Schedule] Fix decompose reduction with thread binding loops
- #15432 - Generalize implementation of T.macro to work with other dialects
- #15413 - Fix Primitive Rfactor DType
- #15404 - Allow starred expressions in TIR script
- #15374 - Finer predicate handling in cross-thread reduction
- #15373 - Allreduce broadcast result to each thread in multi-warp case
- #15214 - [UX] Implement privacy annotations in TIR
- #15241 - Return error code from kernels in SplitHostDevice
- #15327 - ThreadAllreduce warp-level primitive support with multi-warp
- #15260 - Implement TIR macros
- #15253 - Call TVMBackendFreeWorkspace inside LetStmt
- #15264 - Allow symbolic bounds in IndexMap analysis
- #15243 - Output DeclBuffer in LowerTVMBuiltin
- #15236 - [Schedule] Scoped CacheRead/Write producing compact region
- #15242 - Preserve AllocateNode::annotations
- #15247 - Allow VerifyWellFormed to accept IRModule
- #15192 - Support cross-threaad reduction lowering with thread-broadcasting rewrite
- #15210 - [Schedule] Derive Nonnegative Bounds from Shape Var
- #15207 - [Transform] Add LiftThreadBinding Pass
TOPI
- #15685 - [Target]Use LLVM for x86 CPU feature lookup
- #15710 - Ensure vectorization of input padding in
arm_cpu
int8 conv2d interleaved schedule - #15513 - check empty array of x86 injective's iters
- #15371 - Revert "Add
arm_cpu
specific pooling schedules" - #15311 - Add
arm_cpu
specific pooling schedules - #15286 - Revert "Add
arm_cpu
specific pooling schedules" - #14855 - Add
arm_cpu
specific pooling schedules
TVMC
- #15779 - enable dumping imported modules too
- #15349 - Add tvmc flag to print compilation time per pass
TVMScript
- #15824 - Preserve traceback across TVMScript parsing
- #15762 - Use environment variable TVM_BLACK_FORMAT for .show()
- #15706 - Disable
black_format
by default - #15705 - [FIX] Disable
show_object_address
in printing by default - #15579 - Optionally output the address as part of variable names
- #15564 - Use triple-quoted python strings for metadata
- #15547 - Create loop var with min_val dtype in for frame
- #15492 - Allow use of Python builtins in script
- #15442 - Support starred indices in for-loop
- #15249 - Ensure completed root block has no read/write
- #15239 - Handle parsing of PrimFunc calls with non-void return
cuda & cutlass & tensorrt
- #15573 - [CUTLASS][Cherry-pick] Introduce several features of cutlass profiler
- #15480 - [Bugfix][CUTLASS] CUTLASS path finding
micoNPU
- #15780 - [microNPU][ETHOSU] MatMul legalization support
- #15428 - [microNPU][ETHOSU] Fix concatenation with reused buffers
- #14909 - [ETHOSU][MicroNPU][Pass] Add a pass to replicate pads
- #15186 - [microNPU][ETHOSU] Add Vela's logic to select configuration block
microTVM
- #15667 - Check the output of microNPU demos in CI
web
- #15218 - Increase default EMCC compilation total memory size
Misc
- #15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
- #15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
- #15847 - [release] Update version to 0.14.0 and 0.15.0.dev on main branch
- #15867 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/ethosu
- #15866 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/cmsisnn
- #15865 - Bump pillow from 9.2.0 to 10.0.1 in /apps/microtvm
- #15833 - [VM] Memory Manager moved up to runtime
- #15859 - [Script] Fix miscs of make_notes.py
- #15818 - [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
- #15761 - [Target] LLVM helper functions for any target info
- #15672 - [IR] Implemented Variant<...> container
- #15714 - [Target][Device] Auto detect target and create device from str in torch style
- #15723 - fix _convert_simple_rnn
- #15725 - Revert "[CodeGenC] Handle GlobalVar callee as internal function call"
- #15684 - [Hopper TMA] Add intrinsic to create barriers for synchronization
- #15683 - Fix a bug caused by PyTorch instance_norm when the input shape is [1,1,1,2]
- #15596 - [FFI] Propagate Python errors across FFI boundaries
- #15666 - [Module] Implement custom imported modules serialization
- #15656 - [Hopper TMA] Add CUDA codegen support for bulk asynchronous copy
- #15664 - [IR] Use structural equal for Range equality
- #15649 - Add output_data_sec section in corstone300.ld
- #15639 - Do not link LLVM libraries into cpptest binary
- #15631 - [RPC] Enhance RPC Protocol to support TVM Object
- #15624 - [CMake] Add RCCL to TVM and TVM Runtime
- #15616 - [Hopper TMA] CUDA codegen for async copy with barrier synchronization
- #15537 - [CPP_RPC] export listdir for RPC
- #15605 - [CMake] Add NCCL to TVM and TVM Runtime
- #15580 - Fix "to" duplicate word in python and C header file
- #15581 - Remove duplicate load word inside .cc file
- #15582 - Remove duplicate 'from' word inside python script
- #15554 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm
- #15552 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/ethosu
- #15553 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/cmsisnn
- #15536 - fixed typo [TypoFix]
- #15529 - [quantize] fix bug of annotate for output of add op
- #15535 - Fixed search task comment
- #15530 - Remove duplicate msg word and condition inside the function doc
- #15511 - Remove IRModule Dependency from Target
- #15525 - Fix typo mistake and change whethe to whether
- #15524 - Remove duplicate the word
- #15103 - [CodeGenC] Handle GlobalVar callee as internal function call
- #15419 - [VM][Textures] Enable OpenCL textures for VM
- #15483 - [Script] Be more careful when generating ast.ExtSlice for Subscript
- #15469 - [CYTHON] Make cython compatible with 3.0
- #15423 - [Submodule] Add Flash attention v2
- #15380 - [Target] Add Jetson Orin Nano tag
- #15359 - [CMAKE] Conditionally link "clog" in NNPack install
- #15326 - [OP] Add
rms_norm
into TOPI - #15312 - [skipci] Fix typo in docs/arch/index.rst
- #15298 - [Release] Extend PR tags and Format PR hyper-links in release report
- #15328 - [Package] Remove cutlass media/docs inside cutlass_fpA_intB_gemm
- #15321 - [JVM] Fix the Maven pom.xml for OS X arm64 tvm4j build
- #15265 - Fix keras version problem
- #15292 - [RPC] Fix socket bind errno on corner case
- #15287 - [Exec] Add a script to test GPU memory bandwidth
- #15234 - [Miscs] Enhance script about make release notes
- #15229 - [CMAKE] Add Vulkan header for Android
- #15215 - [Android] ndk static build
- #15208 - Update version to 0.14.dev0 on main branch