github apache/tvm v0.14.0
Apache TVM v0.14.0

latest releases: v0.17.0.rc0, v0.17.0, v0.18.dev0...
11 months ago

Introduction

The TVM community has worked since the v0.13.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFC
  • Arith, MetaSchedule
  • Adreno, ArmComputeLibrary, Hexagon, Metal, OpenCL & CLML, ROCm, Vulkan, cuda & cutlass & tensorrt, micoNPU, web
  • Runtime, TVMC, AOT, LLVM, microTVM, CMSIS-NN
  • Frontend, Relay, BYOC
  • TOPI, TIR, TVMScript
  • Docs, CI, Docker
  • Misc, , BugFix

Please visit the full listing of commits for a complete view: v0.13.0...v0.14.0.

Community

  • #15307 - Qingchao Shen -> Reviewer
  • #15619 - community strategy decision process

RFC


AOT

  • #15301 - Avoid call_extern() with incorrect argument count
  • #15181 - Remove workaround to help resolve test flakiness

Adreno

  • #15830 - Minor changes for Adreno docs and help scripts
  • #15671 - [VM]Fix using buffers for weights in VM
  • #15391 - Small fixes in Adreno schedules

Arith

  • #15881 - Simplify the result of non-divisible floordiv
  • #15665 - Fix detect non-divisible iteration form like (x % 255) // 16
  • #15638 - MLIR PresburgerSet compile fix mlir >= 160
  • #15628 - Added simplification rule for multiple equality compares
  • #15558 - Fix detect linear equation with uint var
  • #14690 - Add tvm::arith::PresburgerSetNode to work with Presburger Set in MLIR
  • #15555 - Fix handling of overlapping predicates
  • #15471 - Enhance Canonical Simplify for LE
  • #15228 - Enhance buffer shape bound deduction to include offset

ArmComputeLibrary

  • #15600 - [ACL] Update Compute Library to v23.05.1
  • #15344 - [ACL] Update Compute Library to v23.05

BugFix

  • #15891 - [Relay]fix axis parsing of repeat converter in the MXNet frontend
  • #15873 - [Fix] Remove duplicated words from comments, NFC
  • #15868 - [Relay]Fix conv transpose with default strides in ONNX frontend
  • #15773 - [CPP] Fix cpp deploy bug
  • #15778 - [Hotfix] Fix Windows Pipe
  • #15748 - Move symbols that are relevant to the runtime from libtvm to…
  • #15752 - [Relay]fix the wrong calculate logic of operator flip in PyTorch frontend
  • #15715 - [Relay]Fix the wrong implementation about operator Threshold in oneflow
  • #15711 - [Strategy] Fix arm_cpu int8 conv2d strategy for dotprod and i8mm targets
  • #15717 - [Relay]fix the wrong implementation of Softplus in OneFlow
  • #15677 - [Arith] IterMapRewriter abort rewriting once failure
  • #15629 - [VTA] tvm.tir.Call has no name attribute
  • #15584 - [Relay][Strategy] Enable compile time transformation of weights matrix for arm_cpu NHWC quantized conv2d
  • #15542 - [Fix] Fix the typo in compile flag
  • #15484 - [TOPI] Fix a bug in arm_cpu int8 conv2d i8mm schedule
  • #15473 - [Relay] Fix some bugs of dominator pattern
  • #15478 - [TIR] ThreadSync with shared.dyn awareness
  • #15406 - [TIR]Ensure the Var's scope is correct
  • #15399 - [TIR] Fix multi-grouped multi-warp allreduce
  • #15350 - [Relay] fix a bug of printing dataflow pattern
  • #15385 - Work around "Internal Compiler Error" in MSVC
  • #15294 - [Bug][Relay] fix relay frontend pytorch op addmm bug
  • #15323 - [Fix][TIR] LowerThreadAllreduce with correct thread mask
  • #15291 - [Relay][GraphExecutor] Fix set_input_zero_copy() precision bug
  • #15225 - Fix function to read all file

CI

  • #15903 - [Target]Add LLVM functions for current system info
  • #15897 - [ADRENO] Few updates to Adreno docker setup
  • #15836 - Update ci-gpu image
  • #15668 - Allow Limit CPUs in Docker
  • #15568 - [Testing] Allow Capitalized name in CompareBeforeAfter
  • #15519 - [TEST] Run tests/python/relay/aot tests in ci-cortexm
  • #15485 - Remove cython version pin
  • #15421 - Bump Flax and Jaxlib versions to fix Jaxlib install error
  • #15226 - Add ml_dypes dependency for all docker images
  • #15353 - Pin cython version to fix cython compilation
  • #15352 - Make Graviton3 default AArch64 job runner node
  • #15339 - Update test to include unique attribute
  • #15277 - [Testing] Return BenchmarkResult in local_run and rpc_run
  • #15268 - [Testing] Add tvm.testing.local_run
  • #15136 - [UnitTest][NVPTX] Avoid cascading failures from CUDA postproc

CMSIS-NN

  • #15747 - Move CMSIS_5 from SHA to release based upgrade
  • #15407 - Support for Softmax Int16 operator

Docker

  • #15799 - Add LLVM 17 to the LLVM install script
  • #15862 - Upgrade oneflow to v0.8.0
  • #15819 - Install oneflow from PyPi
  • #15310 - Update ci-cortexm docker image
  • #15293 - tensorflow_aarch64 package upgrade

Docs

  • #15619 - community strategy decision process
  • #15508 - Add v0.13.0 docs to site
  • #15213 - [#15157][Rust][Doc] Re-enable the Rust documentation build

Frontend

  • #15821 - [TFLite]Support quantized ELU
  • #15844 - [TFLite]Fix test failures caused by div-by-zero
  • #15798 - [TFLite]Support quantized Pow
  • #15829 - [Relay][Keras][Bugfix] fix the converters of GRU and SimpleRNN about the go_backwards attribute
  • #15838 - Fix unnecessary pylint errors
  • #15802 - [SkipCI][Hotfix][TFLite] Disable test of quantized floor mod
  • #15790 - [TFLite]Support quantized LESS_EQUAL
  • #15775 - [TFLite]Support quantized GREATER_EQUAL
  • #15769 - [TFLite]Support quantized NOT_EQUAL
  • #15768 - [TFLite]Support quantized div
  • #15746 - [TFLite]Support quantized LESS
  • #15733 - [TFLite]Support quantized floor_mod
  • #15724 - [TFLite]Support quantized floor_div
  • #15602 - [ONNX][BugFix] Support If body with free variable from graph input
  • #15472 - [Relay][TFLite] Fix in qnn.conv2d when parameter groups not equal to 1
  • #15117 - [TFLITE] Add support for TFLite's regular NMS operator
  • #15415 - [ONNX] add onnx Mish operator
  • #15422 - [Keras] Add support for swish actiivation
  • #15370 - [Relay][Pytorch] Add aten::view_as
  • #15335 - [Bugfix][Keras] Add a check to reject the invalid input shape
  • #15334 - [Bugfix][Relay][Keras] Add a assertion to reject a invalid value for attribute units in RNN layers
  • #15337 - [Bugfix][Keras]Fix a corner case bug in softmax converter of keras frontend
  • #15259 - [TFLITE][BugFix] Fix variable typo in batchmatmul converting func
  • #15261 - [bugfix][keras] Fix go_backwards attribute of LSTM in keras frontend

Hexagon

  • #15788 - Properly handle RPC server shutdown
  • #15599 - F2qi avgpool bug fix
  • #15414 - Add default vtcm capacity for targets
  • #15367 - Simplify Mul->Sub->Conv to Conv->Add when possible
  • #15258 - Propagate QNN Concat Quantization Params to Inputs

LLVM

  • #15921 - Fix for llvm CodeGenOpt API change

MetaSchedule

  • #15792 - Allow generating uint random data
  • #15574 - Fix metaschedule flop estimation for non-integer loop dimensions
  • #15532 - Enable subprocess to stdout for DEBUG level
  • #15437 - Fix mma default rule and disable tuning abort
  • #15133 - [XGBoost,MetaSchedule] Support xgb set tree method

Metal

  • #15756 - [Unittest]Add minimal metal functionality test to CI
  • #15749 - [UnitTest]Parametrize allreduce GPU tests
  • #15401 - [Codegen]Support metal warp-level primitive

OpenCL & CLML

  • #15745 - [OpenCL] Don't initialize OpenCL runtime on host
  • #15400 - [VM][OpenCL] Introduce textures allocation to VM memory manager

ROCm

  • #15777 - [Codegen]Mismatched Dtype of Workgroup/Workitem
  • #15464 - fma intrin
  • #15454 - Fix some ROCm codegen bugs

Relay

  • #15889 - fix the conflicted documentation description
  • #15648 - [TOPI] Remove input padding for arm_cpu conv2d int8 native schedule in Legalize pass
  • #15386 - Fix an adaptive_max_pool1d operator conversion bug
  • #15533 - Disable exception for ADT in mixed precision pass
  • #15506 - [Strategy] Use x86 pool schedules for arm_cpu
  • #15470 - [Strategy] Use x86 dense schedules for arm_cpu
  • #15392 - add redirecting operation to dataflow pattern graph
  • #15468 - [Strategy] Fix arm_cpu int8 conv2d schedule selection for 32-bit targets
  • #15461 - Stop ToMixedPrecision when constant is out of dtype range
  • #15362 - improve SimplifyClipAndConsecutiveCast pass
  • #15137 - Introduce arguments limit to FuseOps pass
  • #15211 - Fix bug in MergeCompilerRegions pass
  • #15237 - ExprMutator Return Origin Expr When All Fields Isn't Changed
  • #15235 - [QNN] Support Dequantize to "float16" and Quantize to "uint16"

Runtime

  • #15693 - Make CSourceModule and StaticLibraryModule Binary Serializable
  • #15658 - Make export_library parameters after file_name keyword-only
  • #15637 - [Backport]Fix ICE from Clang
  • #15244 - Serialization/Deserialization of runtime module
  • #15630 - Utils to Stringify Device
  • #15623 - Expose ModuleGetFunction as PackedFunc
  • #15595 - Enhance PackedFunc Metaprogramming with PackArgs
  • #15543 - [Minor] Suppress verbose logging in Metal device API
  • #15305 - Flush L2 cache in time eval
  • #15332 - Device API to query L2 cache size

TIR

  • #15913 - Fix offset_factor in cuda tensor core intrins
  • #15906 - Fix the error example in the documentation for pad_einsum
  • #15816 - Revert "[TensorIR][Visitor] Visit buffer members in match_buffer's in block visitor functions (#15153)
  • #15763 - Do not drop 4th argument to tir.max
  • #15646 - Output DeclBuffer in LowerThreadAllreduce
  • #15493 - Output DeclBuffer in SplitHostDevice
  • #15517 - Shuffle in PointerValueTypeRewrite for scalar reads
  • #15263 - Output DeclBuffer in MakePackedAPI
  • #15465 - [TIR, Schedule] Fix decompose reduction with thread binding loops
  • #15432 - Generalize implementation of T.macro to work with other dialects
  • #15413 - Fix Primitive Rfactor DType
  • #15404 - Allow starred expressions in TIR script
  • #15374 - Finer predicate handling in cross-thread reduction
  • #15373 - Allreduce broadcast result to each thread in multi-warp case
  • #15214 - [UX] Implement privacy annotations in TIR
  • #15241 - Return error code from kernels in SplitHostDevice
  • #15327 - ThreadAllreduce warp-level primitive support with multi-warp
  • #15260 - Implement TIR macros
  • #15253 - Call TVMBackendFreeWorkspace inside LetStmt
  • #15264 - Allow symbolic bounds in IndexMap analysis
  • #15243 - Output DeclBuffer in LowerTVMBuiltin
  • #15236 - [Schedule] Scoped CacheRead/Write producing compact region
  • #15242 - Preserve AllocateNode::annotations
  • #15247 - Allow VerifyWellFormed to accept IRModule
  • #15192 - Support cross-threaad reduction lowering with thread-broadcasting rewrite
  • #15210 - [Schedule] Derive Nonnegative Bounds from Shape Var
  • #15207 - [Transform] Add LiftThreadBinding Pass

TOPI

  • #15685 - [Target]Use LLVM for x86 CPU feature lookup
  • #15710 - Ensure vectorization of input padding in arm_cpu int8 conv2d interleaved schedule
  • #15513 - check empty array of x86 injective's iters
  • #15371 - Revert "Add arm_cpu specific pooling schedules"
  • #15311 - Add arm_cpu specific pooling schedules
  • #15286 - Revert "Add arm_cpu specific pooling schedules"
  • #14855 - Add arm_cpu specific pooling schedules

TVMC

  • #15779 - enable dumping imported modules too
  • #15349 - Add tvmc flag to print compilation time per pass

TVMScript

  • #15824 - Preserve traceback across TVMScript parsing
  • #15762 - Use environment variable TVM_BLACK_FORMAT for .show()
  • #15706 - Disable black_format by default
  • #15705 - [FIX] Disable show_object_address in printing by default
  • #15579 - Optionally output the address as part of variable names
  • #15564 - Use triple-quoted python strings for metadata
  • #15547 - Create loop var with min_val dtype in for frame
  • #15492 - Allow use of Python builtins in script
  • #15442 - Support starred indices in for-loop
  • #15249 - Ensure completed root block has no read/write
  • #15239 - Handle parsing of PrimFunc calls with non-void return

cuda & cutlass & tensorrt

  • #15573 - [CUTLASS][Cherry-pick] Introduce several features of cutlass profiler
  • #15480 - [Bugfix][CUTLASS] CUTLASS path finding

micoNPU

  • #15780 - [microNPU][ETHOSU] MatMul legalization support
  • #15428 - [microNPU][ETHOSU] Fix concatenation with reused buffers
  • #14909 - [ETHOSU][MicroNPU][Pass] Add a pass to replicate pads
  • #15186 - [microNPU][ETHOSU] Add Vela's logic to select configuration block

microTVM

  • #15667 - Check the output of microNPU demos in CI

web

  • #15218 - Increase default EMCC compilation total memory size

Misc

  • #15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
  • #15934 - [Release] [Dont Squash] Modify version number to 0.14.0 and 0.15.0.dev on main branch
  • #15847 - [release] Update version to 0.14.0 and 0.15.0.dev on main branch
  • #15867 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/ethosu
  • #15866 - Bump pillow from 9.3.0 to 10.0.1 in /apps/microtvm/cmsisnn
  • #15865 - Bump pillow from 9.2.0 to 10.0.1 in /apps/microtvm
  • #15833 - [VM] Memory Manager moved up to runtime
  • #15859 - [Script] Fix miscs of make_notes.py
  • #15818 - [CLI TOOLS][RTVM] Improve rtvm tool with new options to measure native performance
  • #15761 - [Target] LLVM helper functions for any target info
  • #15672 - [IR] Implemented Variant<...> container
  • #15714 - [Target][Device] Auto detect target and create device from str in torch style
  • #15723 - fix _convert_simple_rnn
  • #15725 - Revert "[CodeGenC] Handle GlobalVar callee as internal function call"
  • #15684 - [Hopper TMA] Add intrinsic to create barriers for synchronization
  • #15683 - Fix a bug caused by PyTorch instance_norm when the input shape is [1,1,1,2]
  • #15596 - [FFI] Propagate Python errors across FFI boundaries
  • #15666 - [Module] Implement custom imported modules serialization
  • #15656 - [Hopper TMA] Add CUDA codegen support for bulk asynchronous copy
  • #15664 - [IR] Use structural equal for Range equality
  • #15649 - Add output_data_sec section in corstone300.ld
  • #15639 - Do not link LLVM libraries into cpptest binary
  • #15631 - [RPC] Enhance RPC Protocol to support TVM Object
  • #15624 - [CMake] Add RCCL to TVM and TVM Runtime
  • #15616 - [Hopper TMA] CUDA codegen for async copy with barrier synchronization
  • #15537 - [CPP_RPC] export listdir for RPC
  • #15605 - [CMake] Add NCCL to TVM and TVM Runtime
  • #15580 - Fix "to" duplicate word in python and C header file
  • #15581 - Remove duplicate load word inside .cc file
  • #15582 - Remove duplicate 'from' word inside python script
  • #15554 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm
  • #15552 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/ethosu
  • #15553 - Bump tornado from 6.1 to 6.3.3 in /apps/microtvm/cmsisnn
  • #15536 - fixed typo [TypoFix]
  • #15529 - [quantize] fix bug of annotate for output of add op
  • #15535 - Fixed search task comment
  • #15530 - Remove duplicate msg word and condition inside the function doc
  • #15511 - Remove IRModule Dependency from Target
  • #15525 - Fix typo mistake and change whethe to whether
  • #15524 - Remove duplicate the word
  • #15103 - [CodeGenC] Handle GlobalVar callee as internal function call
  • #15419 - [VM][Textures] Enable OpenCL textures for VM
  • #15483 - [Script] Be more careful when generating ast.ExtSlice for Subscript
  • #15469 - [CYTHON] Make cython compatible with 3.0
  • #15423 - [Submodule] Add Flash attention v2
  • #15380 - [Target] Add Jetson Orin Nano tag
  • #15359 - [CMAKE] Conditionally link "clog" in NNPack install
  • #15326 - [OP] Add rms_norm into TOPI
  • #15312 - [skipci] Fix typo in docs/arch/index.rst
  • #15298 - [Release] Extend PR tags and Format PR hyper-links in release report
  • #15328 - [Package] Remove cutlass media/docs inside cutlass_fpA_intB_gemm
  • #15321 - [JVM] Fix the Maven pom.xml for OS X arm64 tvm4j build
  • #15265 - Fix keras version problem
  • #15292 - [RPC] Fix socket bind errno on corner case
  • #15287 - [Exec] Add a script to test GPU memory bandwidth
  • #15234 - [Miscs] Enhance script about make release notes
  • #15229 - [CMAKE] Add Vulkan header for Android
  • #15215 - [Android] ndk static build
  • #15208 - Update version to 0.14.dev0 on main branch

Don't miss a new tvm release

NewReleases is sending notifications on new releases.