Introduction

The TVM community has worked since the v0.10.0 release to deliver the following new exciting improvements!

Metaschedule
- Tuning API improvements and anchor-block tuning
TVMSCript metaprogramming
- Lots of progress wiht TVMScript, with the introduction of a core parser, AST, Evaluator, Source and diagnostics

And many other general improvements to microTVM, code quality, CI, frontends, and more! Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.

RFCs

These RFCs have been merged in apache/tvm-rfcs since the last release.

CodeGenAArch64 backend with Scalable Vector Extension (SVE) #94 apache/tvm-rfcs@04b9909

What's Changed

Note that this list is not comprehensive of all PRs and discussions since v0.10. Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.

Adreno

[Adreno] Add global pooling schedule (#13573)
[Adreno] Add documentation for Adreno deployment (#13393)
[Adreno] Fix mem_scope annotations for prim funcs having several heads (#13153)
[Adreno] Adapt reduction schedule for adreno (#13100)
[Adreno] Fix winograd accuracy (#13117)
[Adreno][Textures] Fix static memory planner (#13253)
[DOCKER][Adreno]Docker infra for Adreno target with CLML support (#12833)

AoT

[AOT] Add CreateExecutorMetadata analysis pass (#13250)
[AOT] Add CreateFunctionMetadata analysis pass (#13095)
[AOT] Sanitize input/output name in runtime (#13046)

Arith

[Arith] Add internal NarrowPredicateExpression utility (#13041)
[Arith] Optional rewriting and simplification into AND of ORs (#12972)

arm

[bfloat16] Fixed dtype conversion in the arm_cpu injective schedule (#13417)

AutoTVM

[AutoTVM] Introducing multi_filter into ConfigSpace autotvm (#12545)

Build

[BUILD] Re-enable ccache by default (#12839)

CI

[ci] Fix docs deploy (#13570)
[ci] Split Jenkinsfile into platform-specific jobs (#13300)
[ci] Dis-allow any non-S3 URLs in CI (#13283)
[ci] Split out C++ unittests (#13335)
[CI] Separate the ci scripts into Github and Jenkins scripts (#13368)
[ci] Assert some tests are not skipped in the CI (#12915)
[ci] Ignore JUnit upload failures (#13142)
[ci] Lint for trailing newlines and spaces (#13058)
[ci] Template build steps (#12983)
[ci][docker] Allow usage of ECR images in PRs (#13590)
[ci][docker] Read docker image tags during CI runs (#13572)
[ci][wasm] Add package-lock.json to git (#13505)

CL

[ACL] Enable int8 data type in pooling operators (#13488)

CMSIS-NN

[CMSIS-NN] Support for int16 conv2d (#12950)
[CMSIS-NN] Support for int16 in fully connected layer (#13484)

DNNL

[AMP] refine AMP and the corresponding tests for bfloat16 (#12787)

Docker

[Docker]Refactor timezone script and NRF installation (#13342)

Docs

[docs] Fix empty code blocks in tutorials (#13188)

Ethos-N

[ETHOSN] Consolidate target string usage (#13159)
[ETHOSN] Throw error message when inference fails (#13022)
[ETHOSN] Inline non-compute-intensive partitions (#13092)
[ETHOSN] Transpose fully connected weights (#12970)
[ETHOSN] Support conversion of add/mul to requantize where possible (#12887)

Frontend

[TFLite] Enable int64 biases for int16 quantized operators (#12042)

Hexagon

[Hexagon] Add HVX quant conv2d implementation (#13256)
[Hexagon] Add test to show scheduling of resnet50 with async dma pipe… (#13352)
[Hexagon] Enable Hexagon User DMA bypass mode (#13381)
[Hexagon] Lint tests part 2 (#13271)
[Hexagon] Add pylint on tests (#13233)
[Hexagon] Add E2E test demonstrating how to apply blocked layout schedule to conv2d via metaschedule (#13180)
[Hexagon] Add a test to show how to use multi input async dma pipelin… (#13110)
[Hexagon]: Add upload function to hexagon session (#13161)
[Hexagon] Add support for instrumentation based profiling for Hexagon (#12971)
[Hexagon] Add power manager (#13162)
[Hexagon] Add scripts for e2e MetaSchedule tuning demonstration (#13135)
[Hexagon] Add feature to copy logcat to --hexagon-debug and add new --sysmon-profile option to run sysmon profiler during the test (#13107)
[Hexagon] Async DMA pipelining test suite (#13005)
[Hexagon] Enable multi input Async DMA; same queue / stage (#13037)
[Hexagon] Do not use target test fixture in Hexagon tests (#12981)
[Hexagon] 3-stage pipeline; multi queue async DMA for cache read / write (#12954)
[Hexagon] vrmpy tensorization for e2e compilation of int8 models (#12911)
[Hexagon] Support template-free meta schedule tuning (#12854)
[Hexagon] depth_to_space slice op (#12669)
[Hexagon] Make allocate_hexagon_array a hexagon contrib API (#13336)
[Hexagon] Add fix for vtcm allocation searches (#13197)
[MetaSchedule][Hexagon] Add postproc for verifying VTCM usage (#13538)
[Hexagon][QNN] Add TOPI strategies for qnn ops mul/tanh/subtract (#13416)
[Logging][Hexagon] Improve logging on Hexagon (#13072)
[Hexagon] [runtime] Per-thread hardware resource management (#13181)
[Hexagon] [runtime] Create objects to manage thread hardware resources (#13111)
[QNN][Hexagon] Disable QNN canonicalization pass (#12398)
[Hexagon] [runtime] Manage RPC and runtime buffers separately (#13028)
[Hexagon] [runtime] VTCM Allocator (#12947)
[TOPI][Hexagon] Add schedule and test for maxpool uint8 layout (#12826)
[TOPI][Hexagon] Implement quantize op for hexagon (#12820)
[Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule (#12141)
[TIR] [Hexagon] Add vdmpy intrinsic and transform_layout for tests (#13557)
[Hexagon] [runtime] Support VTCM alignments of 128 or 2k (#12999)
[HEXAGON][QHL] Clippling the inputs of HVX version of QHL Sigmoid operation (#12919)
[Hexagon] [runtime] Add user DMA to device API resource management (#12918)

LLVM

[LLVM] Emit fp16/fp32 builtins directly into target module (#12877)
[LLVM] Switch to using New Pass Manager (NPM) with LLVM 16+ (#13515)

MetaSchedule

[MetaSchedule] Make MultiLevelTiling apply condition customizable (#13535)
[MetaSchedule] Enhance Database Validation Script (#13459)
[MetaSchedule] Fix Dynamic Loop from AutoBinding (#13421)
[MetaSchedule] Support schedules with cache read in RewriteLayout (#13384)
[MetaSchedule] Improve inlining and VerifyGPUCode for quantized model workload (#13334)
[MetaSchedule] Add JSON Database Validation Scripts (#12948)
[MetaSchedule] Fix the order of applying AutoInline in ScheduleUsingAnchorTrace (#13329)
[MetaSchedule] Refactor ScheduleRule Attributes (#13195)
[MetaSchedule] Improve the script for TorchBench model tuning & benchmarking (#13255)
[MetaSchedule] Enable anchor-block tuning (#13206)
[MetaSchedule] Introduce a variant of ModuleEquality to enable ignoring NDArray raw data (#13091)
[MetaSchedule] Consolidate module hashing and equality testing (#13050)
[MetaSchedule] Support RewriteLayout postproc on AllocateConst (#12991)
[MetaSchedule] Tuning API cleanup & ergonomics (#12895)
[MetaSchedule] Fix XGBoost Import Issue (#12936)
[MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking (#12914)
[MetaSchedule] Restore num_threads parameter in tuning API (#13561)
[MetaSchedule] TorchBench tuning script: add option to disallow operators in sub graph (#13453)
[MetaSchedule] Fix segfault in gradient based scheduler (#13399)
[MetaSchedule] Add from-target Defaults for x86 VNNI Targets (#13383)
[MetaSchedule] Fix Task Hanging in EvolutionarySearch (#13246)
[MetaSchedule] Allow skipping exact NDArray rewrite in RemoveWeightLayoutRewriteBlock (#13052)
[MetaSchedule][UX] Support Interactive Performance Table Printing in Notebook (#13006)
[MetaSchedule][UX] User Interface for Jupyter Notebook (#12866)

microNPU

[microNPU] Upgrade Vela to v3.5.0 (#13394)
[microNPU] Fixed MergeConstants pass on striped networks (#13281)

microTVM

[microNPU] Upgrade Vela to v3.5.0 (#13394)
[microNPU] Fixed MergeConstants pass on striped networks (#13281)
[microTVM] Modernize Arm Cortex-M convolution schedules (#13242)
[microTVM] Improve code reuse in Corstone300 conv2d tests (#13051)
[microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts (#12969)
[microTVM] Use default Project Options in template projects and add Makefile for Arduino template project (#12818)
[microTVM] Generalize depthwise_conv2d schedule (#12856)
[microTVM] add the option to open a saved micro project for debugging (#12495)
Added macro generation in MLF export (#12789)
[microTVM][Arduino]Add serial_number to project options and tests (#13518)
[microTVM][Zephyr] Add 'serial_number' option (#13377)
[microTVM][PyTorch][Tutorial]Adding a PyTorch tutorial for microTVM with CRT (#13324)

Misc

[CodegenC] Explicit forward function declarations (#13522)
[FQ2I] Support converting dense -> add to qnn.dense -> add -> requantize (#13578)
[Minor][Testing] Consolidate IRs into corresponding functions (#13339)
Add recursive on loop with marked kUnrolled (#13536)
Skip stride check if shape is 1 in IsContiguous (#13121)
[TEST] CPU feature detection for x86 and ARM dot product instructions (#12980)
[Node] Expose StructuralEqual/Hash handler implemenation to header (#13001)
[Tensorize] Add logs to comparator to make debugging tensorize failures easier (#13285)
[usmp] Also remap VarNode to USMP-allocated buffer (#12880)
[Virtual Machine] Implementation of 'set_output_zero_copy' (#11358)

ONNX

[ONNX] Add converter for FastGelu from Microsoft onnxruntime contrib opset (#13119)
[QNN, ONNX] Extension of QLinearMatMul in ONNX front-end for all ranks of input tensors (#13322)

OpenCL

[OpenCL] Introduce OpenCL wrapper to TVM (#13362)
[OpenCL] Introduction of weights on buffers (#13563)
[OPENCL][TEXTURE] Test case enhancements and fixes for RPC (#13408)

Relay

[Relay] Fix CombineParallelDense slicing axis (#13597)
[Relay] Refactor constant folding over expr into a utility function (#13343)
[Relay] Enhancement for fold_scale_axis and simplify_expr (#13275)
[Relay] Add ClipAndConsecutiveCast and CastClip to SimplifyExpr (#13236)
[Relay] Rewrite division by constant to multiply (#13182)
[Relay] Extend split for blocked ConvertLayout pass (#12886)
[Relay][transform][SimplifyExpr] simplify adjacent muls and adds with constants (#13213)
[Relay][Hexagon] Add per-channel FixedPointMultiply operation (#13080)
[IRBuilder][Minor] Add intrinsics like T.int32x4 (#13361)

roofline

[ROOFLINE] Add support for different dtypes (#13003)
[Roofline] Add fma (non-tensorcore) peak flops for CUDA (#13419)

RPC

[RPC] Fix tracker connection termination (#13420)

Runtime

[RUNTIME][CLML] Add fixes to clml runtime api (#13426)
[DLPack][runtime] Update DLPack to v0.7 (#13177)

Target

[Target] Replace utility functions with target.features (#12455)
[Target] Add Target Parser for Arm(R) Cortex(R) A-Profile CPUs (#12454)
[Target] Add target_device_type attribute to override default device_type (#12509)

TIR

[TIR] Add preserve_unit_iters option to blockize/tensorize (#13579)
[TIR] Introduce ReduceBranchingThroughOvercompute (#13299)
[TIR] Unify index data type when creating prim func (#13327)
[TIR] Remove PrimFuncNode::preflattened_buffer_map (#10940)
[TIR] Make syntax of AST nodes different than ops (#13358)
[TIR] Update ReductionIterNotIndexOutputBuffer to check BlockRealizeN… (#13301)
[TIR] Check producer predicate in ReverseComputeInline (#13338)
[TIR] Add utility for anchor block extraction (#13194)
[TIR] Allow IndexMap applied to arguments with different dtypes (#13085)
[TIR] Fix handling of int64 extent in blockize and tensorize (#13069)
[TIR] Refactor NarrowDataType into DataTypeLegalizer (#13049)
[TIR] add unit-tests for upcoming primfunc-slicing (#12794)
[TIR] Fix plan buffer allocation location for loop carried dependencies (#12757)
[TIR] Fix predefined inverse map in layout transform dtype legalization (#13565)
[TIR] Preserve loop annotation after loop partitioning (#13292)
[TIR] Use IndexMap to transform NDArray (#12949)
[TIR] Preserve loop annotations in inject_software_pipeline pass (#12937)
[TIR][Schedule] Support for specific consumer block targeting in cache_write (#13510)
[TIR][Hexagon] Add vtcm memory capacity verification for Hexagon target (#13349)
[TIR][Transform] Optional data-flow analysis in RemoveNoOp (#13217)
[TIR][Analysis][Arith] Implement basic data-flow analysis (#13130)
[TIR][Bugfix] Fix AXIS_SEPARATORS in tir.Schedule.transform_layout (#13326)
[TIR][Arith] Use TryCompare to narrow inequalities if possible (#13024)
[TIR][Primitive] Support rolling_buffer schedule primitive in TensorIR (#13033)
[Arith][TIR] Check for constant offsets of known literal constraints (#13023)
[TIR][Arith] Implement kApplyConstraintsToBooleanBranches extension (#13129)
[TIR][Schedule] Add cache_index to precompute index of buffer load (#13192)
[TIR][Schedule] Add cache_inplace primitive to cache opaque buffer (#12939)
[UnitTest][TIR] Support IRModule comparisons in CompareBeforeAfter (#12920)
[TIR][Arith] Prove conditionals by transitively applying knowns (#12863)
[TIR, MetaSchedule] Preserve unit block iters for auto-tensorization (#12974)
[TIR][MetaSchedule] Add regression test for layout_rewrite extent=1 (#12916)
[TIR][Transform] Keep the allocate buffers order after update buffer allocation location (#13560)
[TIR][Schedule] Fix cache_read loc detecting and region_cover checking (#13345)
[TIR][Transform] Clear buffer_map during MakeUnpackedAPI (#12891)
[TIR][Schedule] Relax cache read/write's restriction and fix unexpected behavior (#12766)

TOPI

[TOPI] Implement Einsum with reduction axes (#12913)
[TOPI] Add layer norm operator (#12864)
[TOPI] Add handwritten matvec for dynamic cases (#13423)
[TOPI] Fix dtype legalize logic for CPU dot product instruction (#12865)
[TOPI][Hexagon] Implement quantized adaptive_avg_pool1d for hexagon (#13282)
[TOPI][Hexagon] Implement quantized depthwise conv2d (#12499)

Torch

[TVM PyTorch Integration] optimized_torch & as_torch how-to guide (#12318)
[frontend][pytorch]Support aten::Tensor_split operator (#12871)

TVMC

[TVMC] Global pass context for compile and tune (#13309)

TVMScript

[TVMScript] Improvements tvm.script.highlight (#13438)
[TVMScript] Reorganize the folder structure (#12496)
[TVMScript] TIR parser (#13190)
[TVMScript] IRModule parser (#13176)
[TVMScript] Evaluator, core parser, var table (#13088)
[TVMScript] AST, Source and diagnostics for Parser (#12978)
[TVMScript] Import TIR methods into the IRBuilder (#12900)
[TVMScript] Infer T.match_buffer parameters for region (#12890)

apache/tvm v0.11.0.rc0 Apache TVM v0.11.0 on GitHub