Introduction
The TVM community has worked since the v0.11.1 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC;
- Runtime: ACL(ArmComputeLibrary), Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, CRT, Hexagon, Metal, Web & WASM, others about runtime;
- Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, OneFlow, keras;
- TE, Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule, Schedule;
- CI, Tests, BugFix, Docs, Docker, Build;
- Android, microTVM, Target, AutoTVM, AOT, LLVM.
Please visit the full listing of commits for a complete view: v0.11.1...v0.12.0.
Thanks @ysh329 for the great effort to the release process as the release manager.
Community
- Reviewer
- Committer
- PMC
RFC
- [RFC] Introduce PresburgerSet (#99) (
e17994b
) - [RFC] Further Unify Packed and Object in TVM Runtime (#97) (
d646a22
)
Runtime
ArmComputeLibrary
- [ACL][TESTING] Use pytest.mark.parametrize in ACL conv2d tests
- [ACL] Prevent offloading of per-channel quantized operators
- [CL] Update Compute Library from v22.11 to v23.02.1
Adreno
- [Adreno] Extend pack_filter for HWIO layout
- [Adreno] Update interface of AnnotateMemoryScope pass
- [Adreno] Optimize reduction schedule
- [BENCHMARK][ADRENO] Adreno Benchmarks with texture
- [BENCHMARKS][CLML] Adreno benchmarks with CLML BYOC path added
- [BENCHMARKS][ADRENO] Documentation for Adreno (Texture) benchmarks
- [DOCS][ADRENO] Improved Adreno documentation
OpenCL & CLML
- OpenCL
- CLML
- [CLML][RUNTIME] Enable more ops in CLML runtime
- [CLML][RELAY] Enable Pad and Conv2d layer fusion
- [CLML][CODEGEN] CLML native codegen utility
- [CLML] Version compatibility and various test cases
- [CLML] Changes corresponding to OpenCL workspace refactorization
- [RUNTIME][CLML] OpenCLML tuning and profiling enhanced
ROCm
CMSIS-NN
- [CMSIS-NN] Global function that provides range based on dtype
- [CMSIS-NN] Add int16 add and mul operator support
- [CMSIS-NN] Add a runtime error message
- [CMSIS-NN] Reduction in code size of AOT test runner binary
- [CMSIS-NN] Remove support for the old CMSIS NN project
- [CMSIS-NN] Support CMSIS NN from new GitHub location
- [CMSIS-NN] Add Cortex-M85 support
CUDA & CUTLASS & TensorRT
- [CUDA][Schedule] Better Layout Transform Schedules
- [Profiler] Allow user to flush L2 cache in
time_evalutor
function for profiling CUDA kernels - [Codegen][CUDA] Add error message for missing fragment info
- [CUTLASS][Ansor] Combine CUTLASS and Ansor
- [TensorRT] Fix BiasAdd with correct axis attribute
- [TRT][BYOC] allow strided_slice ops on selected dimensions (#14142)
Ethosn
- [ETHOSN] Update driver stack version to 22.11
- [ETHOSN] Support for addition with constant input
- [ETHOSN] Apply FoldConstant before NPU partitioning
- [ETHOSN] Remove support for NPU driver 22.08
- [ETHOSN] Fix for the mock inference after NPU driver update
- [ETHOSN] Remove requantize dependency on resize
- [ETHOSN] Add support for experimental compiler option
CRT
- [CRT] USE CMake for CRT standalone libraries
- [CRT][microTVM] Enable USMP by default for AoTExecutor + CRT runtime
- [CRT]Cleanup unused macros in crt_config.h.template
Hexagon
- [Hexagon][TOPI] Use IndexMap axis separator instead of TE
- [Hexagon] Add concept of DMA groups
- [Hexagon] Improve cache management strategy for HexagonBuffer
- [Hexagon] Denote DMA cache bypass as experimental feature
- [Hexagon] Adapt some intrinsics for high vector lanes
- Hexagon compilation on MacOS system
- [Hexagon] Enable depthwise conv2d NHWC with an HWIO kernel layout
- [Hexagon][QNN] Improve performance wo QNN canonicalization
- [Hexagon][Metaschedule] Add timeout_sec arg to get_hexagon_local_builder
- [Hexagon] Fix deprecated call for data layout size in bits
- [Hexagon] Allow scalar tensors to have null shape during allocation
- [Hexagon][runtime] Make HexagonThreadManager::CheckSemaphore thread safe
- [Hexagon] Float and quantized dense operators with schedules
- [Hexagon][CI] Updated sha for builder LLVM
- [Hexagon][CI] Update the docker image ID to reflect newer LLVM
- [Hexagon] Switch from default_rng to random in Hexagon tests
- [Hexagon] Add hexagon user DMA intrins for tensorization
- [hexagon] Hexagon inference fix
Metal
- [METAL][CODEGEN] testcase for ramp codegen
- [CODEGEN][METAL] Fix unaligned vector load
- [CODEGEN][METAL] Fix ramp codegen
MicroNPU
- [microNPU] Sum legalization support
- [microNPU] Add rescale parameters for binary elementwise
- [microNPU] Add hardware constraints for binary elementwise
- [microNPU] Add support for TFLite PAD
- [microNPU] Upgrade Vela to v3.7.0
- [microNPU] Merge LUT activation with binary elementwise operation
- [microNPU] Upgrade to 22.08 version of Arm(R) Ethos(TM)-U NPU drivers
- [microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U
- [microNPU] Add a legalization test for TFLite PAD
- [microNPU] Disable copying weights to SRAM for FullyConnected ops in CopyConstants scheduler
- [microNPU] Add support for ResizeNearestNeighbor with half_pixel_centers=True
Web & WASM
- [Web] Try to upgrade WebGPU API usage to the latest
- [WEB] Reduce memleak in web runtime
- [WEB] WebGPU Codegen
- [WEB] Update web runtime to support latest emcc
- [WASM][FIX] test tests/node/websock_rpc_test.py
Others about Runtime
- [FIX][RUNTIME] Convert container with function value type
- [RUNTIME] Fix the manual determination of cores in FillDataForMeasure
- [RUNTIME] Fix determination of big/little cores domains
- [Runtime] Fix Potential DeviceAPIManager Memory Bug
- [Runtime] Fix high RAM usage when saving / loading paramters of big models
- [Runtime] Runtime module property mask for Metal and Vulkan
- [Runtime] Introduce runtime module property
- [Runtime] Add missing Type2Str for TVMByteArray
Android
- [Android] Fix using system libraries in Android apps
- [TOOL][NATIVE] Android native application for deploy and run
AOT
- [AOT] Added a test for detecting output size post MLF export
- [AOT]Aot module post-test error workaround
- [AOT]Raise error when input name is not valid
- [AoT]Add get_input_name function to AoT Module
Arith
- "[Arith] Simplifications for floormod(x
- [Arith] Implemented PMatchesOneOf and matches_one_of
- [Arith][UnitTest] Parametrize tests of RewriteSimplifier
- [Arith] Use ConstIntBound to remove negative numerator when lowering
- "[Arith][Bugfix] Simplify ""x - 1 < y"" into ""x <= y"""
- "[Arith] Add simplification rule for `x - max(x+y
- [Arith] Updated incorrect simplification rule
- [Arith] Allow const folding on fp16 involving one and zero
- [ARITH] Enhance CanProve to handle symbolic bound
- [ARITH] support floordiv in deduce bound
- [Arith] Support eq in detect_clip_bound
- [Fix][Arith] Analyzer simplification starts with canonical
AutoTVM
BugFix
- [BugFix][UMA] Protect target registration
- [BugFix][Runtime] Add missing check for
PackedFunc
- [Bugfix][TIR] Fix version conflict with typing for Python 3.8.0
- Fix build platform environment variable
- [BugFix][TVMScript] Fix the roundtripability of intrinsic pow
- [BugFix] Pylance emits the warnning 'Code is unreachable'
- [BugFix][TVMScript]fix var capturing order error
- [BugFix][TVMScript] Parser crash
- [Bugfix][TVMScript] Handle LetStmt for
var1 = var2
expressions - [Bug][CodeGen,Cuda]fix cast fp16 to int8/uint8 in cuda
- [fix] MXNet dot for all tensor dimensions
- [Bugfix] Conv1Dtranspose default kernel layout should be IOW
- [Bugfix] Conv3Dtranspose default kernel layout should be IODHW
- [BugFix] Support rewrite_once when the number of callbacks > 1
- [Bugfix][TIR] Fix version conflict with typing for different Python versions (3.8.0-3.10.0)
- Fix out of bound enum conversion
- [bugfix] Fix the write buffer scope of
mma_store_impl
- [BugFix][Runtime] Fix Incorrect node information
Build
- [Build] Expose missing USE_VERILATOR in cmake
- [Build] Fix find_include_path when using TVM python package
- [Build] Fix misleading error messages
- [Build][Bugfix] Use CMAKE_ prefix for _COMPILER_LAUNCHER
BYOC
CI
- [CI][microTVM] Enable USE_MICRO for mac and windows CI builds
- [CI] Pass the 'path' parameter passed to cmake_build to the task_build.py script
- [CI][EZ] Upgrade CI Lint Image
- [CI][Lint] Update black
- [CI][Flaky] Skip zephyr_qemu-x86 tests that are part of task_python_microTVM
- [CI] Fix for NNPack error due to misalignment with pthreadpool library
- [ci] Disable Windows-Static-Runtime
- [ci][docker] Make branch names valid before using them as tags
- [CI] Cross-compile libtvm_runtime to Aarch64 and run tests
- [CI] Include static builds of the runtime as part of CI
- [CI] Update rerun list for tvm-bot
- [CI] Update ci_minimal docker image to cross-compile TVM to aarch64
- [CI] Update ci_arm docker image to have LLVM 15
- [CI] Update Compute Library to v22.11
- [CI] Fix broken model link
- [CI][ETHOSN] Add ssh to the driver stack installation
- [CI] Fix android build by constraining numpy version
- [CI] NNPACK build issue workaround
- [CI] Update GPU image for CUDA 11.7
- [CI] Update CUDA to 11.7
- [CI] Update cpu and gpu image
- [CI] Enable USE_MICRO in minimal cross ISA build
- [CI][microTVM]Update ci_cortexm image
- [CI][Docker][Cortex-M]Update scripts to update ci_cortexm to Ubuntu 20.04
- [CI] Fix MLF input and output name map
- [CI] Pin sccache version to 0.3.3
- [CI] Add llvm-15 and mlir-15 to Docker setup
- [CI] Add onnx dependency to test_auto_tensorize.py::test_vnni_bert_int8
- [CI] Fix test skipping pytest attribute
- [skip ci][ci][docker] Add cross compilation libs
Tests
- [Tests] Replace pytest.main with tvm.testing.main
- [TESTING] Enable execution of test_packed_8x8x32_resnet50
- [testing] Use tuples for numpy indexing
- [testing][py_converter] Enhance py_converter to better support entire modules
- [Unittest] merge test_cp_async_in_if_then_else into test_tir_transform_inject_ptx_async_copy
- [UnitTest] Parametrized test_arith_iter_affine_map::test_padding
Docker
- [Docker] Update ci-cpu and ci-arm to tag 20230223-070143-a3b51f11b
- [docker][microTVM]Fix Zephyr 0.15.2 SDK installation and separate Zephyr python environment
- [docker][microTVM]Update zephyr version to 3.2 and Zephyr SDK to 0.15.2
- [Docker]Add dialout group by default on login
- [Docker] Add script to build llvm from source
- [DOCKER] Configurable NDK version support
- [Docker update] Update ci_cpu tag to the latest from tlcpackstaging
Docs
- [Doc] fix doc for tvm.te.const()
- Add v0.11.0 docs link to site
- [docs] Remove empty code blocks
- [docs] Add details about patch releases
- [Docs] Update listed tvmc python dependencies
- "[docs] Add ""Open with Colab"" button to documentation"
- [Docs] Add
typing-extensions
dependency guide - [Docs] Fix MetaSchedule Docs
- [FIX] Fix Typos in Docs and Comments
- [HotFix][docs] Use correct Colab button URL
Frontend
- TensorFlow & TFLite
- Pytorch
- ONNX
- [Frontend] Add ONNX importer for QLinearSoftmax
- [ONNX] QGemm support
- [ONNX][TOPI] Add
DFT
operator - [Frontend] [ONNX] Support sequence_lens of GRU
- [ONNX] Extend converter for Attention from Microsoft onnxruntime contrib opset
- [ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset
- [ONNX][TORCH] Replace scatter op by scatter_elements
- [ONNX] Support ScatterElements with reduction
- [ONNX] Support Bitwise operations
- [ONNX] Support Bernoulli op on ONNX front-end
- [ONNX] Extend reduction types supported by ScatterND
- [ONNX] Support SequenceEmpty op
- [ONNX] Support SequenceErase op
- [ONNX] Support SequenceLength op
- Keras
- OneFlow
- Paddle
- [PaddlePaddle Hackathon 4][Frontend][Paddle]add conv3d for paddle frontend
- [Frontend][PaddlePaddle] Fix bug in tests for upgrading paddlepaddle to 2.4.2
- [Frontend][Paddle]add take_alone_axis and topk converter for paddle frontend
- [Frontend][Paddle] Add where_index op and add vm for paddle frontend's unitest
- [Frontend][Paddle] Add norm and one_hot_v2 op
- "[Frontend][PaddlePaddle] Add topk op and Fix bug
- [PaddlePaddle Hackathon 4][Frontend][Paddle]Add tile/mish/stack/unstack/silu/softshrink/where op for paddle frontend
- [Frontend][Paddle]fix eye and dist
- [PaddlePaddle Hackathon 4][Frontend][Paddle]add grid-sample/gaussian_random/flip/fill_zeros_like/unique for paddle frontend
- [PaddlePaddle Hackathon 4][Frontend][Paddle]add thresholded_relu/index_select/eye/linspace/take_alone_axis/dist for paddle frontend
microTVM
- [microTVM] Clean-up test_crt.py and add to pylint
- [microTVM] Build standalone_crt with cmake instead of makefile
- [microTVM] additional refactoring for enabling USE_MICRO in more builds
- [microTVM] Fix host-driven AOT memory workspaces
- [microTVM] Fix MacOS build with USE_MICRO=ON
- [microTVM] Use QNN schedules to give SOTA performance
- [microTVM]Fix more security issues with pyproject
- [microTVM] Update poetry to fix security issues
- [microTVM]Enable TVMC micro with AoT Executor
- [microTVM]Add test for MLPerfTiny models
- [microTVM][CRT]Move Makefile to CMake to be cross-platform compatible
- [microTVM]Refactor crt_config.h header file generation
- [microTVM] Refactor required external functions in CRT to platform-template.c
- [microTVM] Update Zephyr version and Zephyr SDK version
- [microTVM]Refactor test and add skip to current failing tests/boards
- [microTVM] Update tutorials
- [microTVM] Add tutorial on how to generate MLPerfTiny submissions
- [microTVM][Zephyr]Add project files for mlperftiny submission
- [microTVM]Add default value to unspecified project options in project API
- [microTVM]Add MLPerfTiny test harness
- [microTVM] Fix tvmc tutorial
- [microTVM][Zephyr] Remove unnecessary use of generate_c_interface_header
- [microTVM][CRT]Separate CRT template project from standalone CRT build
- [microTVM][Zephyr] Fix flash command for nrfjprog
- [microTVM][Zephyr] Fix TVMC test on hardware
- [microTVM] Custom IDE Tutorial
- [microTVM] tuning on micro targets with meta-schedule
- [microTVM] Allow multiple runners in tuning micro models with meta-schedule
- [microTVM] Replace arm_nnsupportfunctions.h with arm_acle.h
LLVM
- [LLVM] Use DataLayout::getABITypeAlign instead of getABITypeAlignment
- [LLVM] Add missing
override
to GetFormat and GetPropertyMask - [LLVM] Add guard for #include <llvm/Transforms/IPO/PassManagerBuilder.h>
- [LLVM] Remove call to EmitDebugLocation from AddAliasInfo
- [LLVM] Use std::nullopt instead of llvm::None
- [LLVM] Fix registerCallbacks API after recent change
- [LLVM] Add support to generate llvm.assume
- [LLVM] Add support for DeclBufferNode
- [LLVM][BugFix] Fix include Triplet.h bug when LLVM version>= 17
- [TEST] Fix division by 0 in llvm codegen test
- [SVE] Adding codegen tests for SVE
MetaSchedule
- [MetaSchedule] Introducing MemHammer
- [MetaSchedule] Introduce Async Pipeline in MultiLevelTiling
- [MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule
- [MetaSchedule] Use
shared.dyn
for Tensor Core Schedule Rules - [MetaSchedule] add fp16-16-32 TensorCores rule to default settings
- [MetaSchedule][Hexagon] Improve vectorization for standalone elementwise op
- "[MetaSchedule] Add ""disabled_pass"" option in tuning API"
- [MetaSchedule] Fix anchor-block flow with empty design space generator
- [Metaschedule] get_top_k should not return not built records
- [Metaschedule] Aligning get_top_k logic in MemoryDatabase and JSONDatabase
- [MetaSchedule] preseve global_symbol attached to function after applying MS
- [MetaSchedule] Fix a typo in MemoryDatabase
- [MetaSchedule] Fix for RewriteLayout + AllocateConst when the rank of the rewritten weight doesn't change
- [MetaSchedule] Fix tensorcore winograd task extraction
- [HotFix][MetaSchedule] Turn off database shash check
- [MetaSchedule] MutateTileSize skip single-candidate SampleCategorical
- [Metaschedule] EvolutionarySearchNode::State constructor typo fix
- [Fix][MetaSchedule] Fix redundant stages in async pipeline for mlt
- [Fix][MetaSchedule] RPCRunner timeout when queueing up
- [MetaSchedule] Add pass instrument to MetaSchedule api
- [MetaSchedule] Tile and pack intermediate output for CUDA TensorCore
- [MeteSchedule] Bugfix: Add checks for nullable
run_secs
Misc
- [UX] Make T.prim_func typecheck as staticmethod
- [VM][DMLC] Lower memory usage when loading and dumping weights
- [APP] Update android_rpc build tools version
- [apps][bundle_deploy]Fix bundle build issue
- [Diagnostic] Support constructing Diagnostic Error through ObjectRef
- [skip ci] Replace magic_wand model with micro_speech
- [IR] Enhance IRModule SEqual/SHash to support cross function calls
- [Fix]Fix function ObjectPath in IRModule SEqual
- Update to v0.12.dev0
- Enable C++17 for cmake modules
- Remove temporary VTCM workspace APIs
- [IR] Platform-independent SHash
- Fix numpy version constraint
- [Utils] Allow classmethod and staticmethod in TVMDerivedObject
- [Git] Ignore python/requirements directory
- Enhance the --help message of composite target
- Add support for named outputs in MLF archive
- Add Name Transforms for Rust style
- Refactor test to make it easier for user to understand how tensor_intrin works
- Remove tutorials CMSIS dependency when not needed
- Add DisallowAsyncStridedMemCopy post processor to rem
- Add check for non-contiguous memory access when lowering to async dma
- Relay transform for rolling a known pattern into batch_matmul
- [Typo] Fix name of iter var type 4
- Extend the USE_LIBBACKTRACE option
- [Refactor] Move
VarUseDefAnalysis
to header file - Add header files for GraphExecutorDebug
- [pytest] Don't return values from test_* functions
- [Analysis] Improve error message in VerifyWellFormed
- Revert the changes for NNPACK build issue
- [Node] Utility methods for ObjectPathPair handling
- [Minor] Change file mode 755 -> 644; EOL CRLF -> LF
- [FIX] Minor Compilation Warning Fixes
- [Contrib][Sort] Faster Top-K Implementation
- [COLLAGE] Add more customization to support more targets
- [CONTAINER] Struct Hash/Equal and JSON support for ShapeTuple
- [VTA] Provide zero-initialization for VTAGenericInsn
- [Fix,Roofline] Fix roofline handling of multiple peak flops
- [RPC] Add fail-guard for termination time exception
- [TOPHUB] use keys as a keyword for searching of existing statistics
- [Transform] Use callable() instead of isinstance() for type checking
- [TRANSFORM] Fix virtual device annotation issue with BYOC subgraphs
Relay
- [Fix][Relay] Fix axis transformation in squeeze shape function
- [QNN][Relay][Topi] Add qnn.dense with weight layout
- [fix][relay][qnn] Bug fix for 8-bit quantized mul
- [Relay][Op] Connect existing arm_cpu schedule to relay strategy for concat
- [Relay] Convert negative axes to positive when importing ONNX Unsqueeze
- [Relay][Frontend] Span Filling PyTorch
- [Relay][Frontend] Span Filling ONNX
- [Relay][Frontend] Span Filling TensorFlow 1
- [Relay][Frontend] Span Filling TFLite
- [Relay][Frontend] Span filling common API
- [Relay][Pass] Separate out the graph partitioning code from fuse_ops.cc
- [Relay] Remove overwriting of matmul shapes when they are static
- [Relay][Frontend][Onnx] SequenceAt and SplitToSequence Operators
- [Relay] Move pad value extraction past null pointer check
- [relay][frontend][pytorch]Fix a bug in the _get_pytorch_value_type function
- [Relay] Enhance EliminateCommonSubexpr to support Tuple argument
- [Relay][TIR] Add utility to lower Relay func to TIR prim func
- "[Relay] Check if the attribute ""name"" exists before accessing it"
- [Relay][Docs] Fixed examples in relay/transform.py documentation
- [Relay][Runtime] Add
set_input/output_zero_copy
in python - [Relay][Testing][Bugfix]
py_converter
should use correct AST for versions above 3.8 too - [relay] preserve the order of input_info of pytorch
- [QNN] Change in Pass Context for lookup table calculation
- [QNN] Convert fake quantized take to quantized op
Schedule
- [Schedule][Bugfix] Fix decompose padding wrt the single child subtree
- [Schedule] Add an optional argument
disable_checks
forSchedule
Target
- "[Target] Make
key=arm_cpu
--> `key=arm_cpu - [Target] Add target tags for Apple Silicon GPU
- [Target] Fix Jetson AGX Xavier CPU core count
- [Target] Add A10G gpu cuda tag
TE
- [TE] Record primitives of Schedule for visualization
- [TE][PrimFunc] Fix create primfunc from te extern with explicit buffer load
Tensorize
- [Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics
- [Tensorize][TOPI] Add AMX Tensorizing for int8 batch matmul
TIR
- [TensorIR] Support for L2 prefetch async copy and pred_guard enabled async in vectorized if_then_else
- [TensorIR][Schedule] New primitive
reorder_block_itervar
- [TensorIR] New schedule primitive
set_dtype
- [Fix][TIR] LowerCrossThreadReduction with write-back predicate
- [TIR] Introduce Pass InjectPTXLDG32
- [Fix][TIR] Fix tvm::arith::UnionLowerBound
- [TIR][Schedule] Add unittest for read_write_at
- [TIR] Add cp.async support for tir.if_then_else
- [tir] fix buffer_decl buffer allocation
- [tir] Add line level debug info
- [TIR][FIX] check args size when creating prim_func by runtime::Registry
- [TIR] not estimating the flops when there is a default estimated flops as attr
- [TIR][Hexagon] Enhancement of NarrowDataType pass for binary ops
- [TIR] Handle nullptr returned by FindEntryFunc
- [TIR]Fix the crash of the pass RemoveNoOp
- [TIR] Update SplitHostDevice to post-process with ConvertSSA
- [TIR][Utility] More flexible tir::Substitute arguments
- [TIR][Analysis] Implement IdentifyMemCpy analysis function
- [TIR] Merged kDeviceThreadAxis and kUseDynamicSharedMemoryTag
- [TIR] Improved SeqStmt::Flatten utility
- [TIR] Use IRModuleNode::Remove to remove None in PrimFuncPass
- [TIR] Use same DataType of builtin::tvm_struct_set in C++ and Python
- [TIR] Update LowerTVMBuiltin to use Optional
- [TIR] Improved MakePackedAPI error message
- [TIR] Legalize dtype of constants in IndexMap
- [TIR] Improved error message in InjectSoftwarePipeline
- [TIR][Schedule] Allow buffer name argument to Schedule.set_scope
- [TIR] Fix dtype mismatch error due to LetStmt
- [Fix][TIR] SampleCategorical apply-to-schedule
- [TIR][Fix] IndexDataTypeNormalizer not unwrapping float casting
- [TIR][Fix] Buffer slicing using index dtype as extent
- [TIR] Create Layout with specified axis dtype
- [TIR][Schedule] Improve cache_index to cache common subexpressions
- [TIR][Arith] Add common sub expr analyzer
- [TIR] [Schedule] Add get_output_blocks primitive
- [TIR] [Analysis] Expose IsOutputBlock to python
- [TIR] [Bugfix] Pass the correct block_sref_reuse to Replace
- [TIR] Fix cache_write bug with allocate const node
- [TIR][Schedule] Fix reverse_compute_inline
- [TIR] Remove special-casing of T.address_of in the storage rewrite pass
- [TIR] Refactor BF16Legalize
- [TIR] Enhance loop unroll with unroll local access
- [TIR] Remove LoadNode and StoreNode
- [TIR] Allow TransformLayout index_map to contain RVs
- [TIR] Allow TransformLayout with non-inversible index map
- [TIR] Fix typo in doc
- [TIR] Update block flags and simplify predicate in Reverse-Compute-Inline
- [TIR][TOPI][x86][CI] Support skylake avx512
- [TIR][TOPI][CI] Fix number of arguments in calls of llvm_pure_intrin
- [TIR][Compute-at] Utilize InverseAffineIterMap for dom estimation
- [TIR] Expose bitwise ops to python
- [TIR] Add merge primitive for TIR schedule
- [TensorIR][Primitive] New schedule primitive
reindex_cache_read/write
- [TIR] Fix Datatype in Lower TVM Builtin
- [TIR] Enable Host Func Attribute for PrimFunc
TOPI
- [FIX][TOPI] Clip with IntImm/FloatImm
- [Fix,TOPI] Consolidate generic and x86 scatter nd
- [Test][Topi] Avoid depending on f32 rounding behavior for crop_and_divide tests
- [TOPI] Expose mem_scope from generic conv2d variants to be more reusable
- [TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests
- [TOPI] Bugfix arm_cpu schedule_conv2d_spatial_pack_nhwc schedule
- [TOPI][OP] Support grouped conv2d_NCHWc
- [TOPI] Fix batch_matmul tensorcore legalize for transpose_b = False case
- [TOPI] Group normalization
- [TOPI] dynamic externsion
- [TOPI] Fix tuple unpack in conv2d NCHWc int8
- [TOPI] Making test_strided_set require a GPU for testing
- [Fix][Relay][TOPI] Bug fix in relay.sum and topi.sum functions
- "[TOPI][Fix] Pool must return error if layout is tiled on H
- [TOPI] Batch Norm Training Mode
- [topi] remove comment redundancy in resize.py
- [TOPI][Hexagon] Implement global_avg_pool2d for hexagon
- [TOPI] Support non-batch cases for topi.nll_loss
- [TOPI] Add instance_norm operator
- [TOPI] Support symbolic shape in einsum
- "[TOPI][Relay][ONNX] Replace scatter_add by scatter_elements(reduction=""add"")"
- [TOPI] Fix data race of batch multibox detection
- [TOPI] Fix index dtype in topi strided_slice
- [TORCH][TOPI] Support mean reduction for scatter_reduce
TVMC
- [TVMC] Fix logging in TVMC
- [TVMC] Stop printing a wall of warnings with tvmc tune
- [TVMC] Add option to dump TIR code to file
- [TVMC] Allow selecting a subset of tasks to be used in
tvmc tune
- [TVMC] Improve --desired-layouts functionality
- [TVMC][microNPU] tvmc option for printing which operators are offloaded to Ethos-U
- [TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled
TVMScript
- [Fix][TVMScript]TVMScript BinOP printing refactor
- [TVMScript] Schedule error reporting with new TVMScript printer
- [TVMScript] Connect
assert_structural_equal
with new TVMScript printer - [TVMScript] Comments and docstrings printing
- [TVMScript]
T.allocate
withT.decl_buffer
syntax sugar for TVMScript printer - [TVMScript]
T.match_buffer
syntax sugar in arguments for TVMScript printer - [TVMScript] Linter-friendly function definitions
- [TVMScript][Fix] Fix
bool
printing for roundtrip - [Fix][TVMScript] Fix
LetStmt
printing logic - [TVMScript] More concise
T.allocate
syntax printing - [TVMScript] Implicit root block syntax sugar for TVMScript printer
- [TVMScript]
T.axis.remap
syntax sugar for TVMScript printer - [TVMScript] Robustify the Highlight Printer
- [TVMScript] Sugar Var Definition in TIR Buffer
- [TVMScript] Distinguish LetStmt and Let expression
- [TVMScript] Simplify TIR Var Definition
- [TVMScript][UX] Introduce decorator for deprecation
- [TVMScript] Support
show_meta
- [TVMScript] Consolidate folder structure
- [TVMScript] Default to T.Buffer than T.buffer_decl
- [TVMScript] Introduce
PrinterConfig
- [TVMScript] Add ObjectPath to LiteralDoc
- [TVMScript] Use TVMScript for all TIR Printing
- [TVMScript] Migrate More to TVMScripr Printer
- [TVMScript] IR Fragment Printing
- [TVMScript] Refactor IRDocsifier
- [TVMScript] Remove obsolete modules
- [TVMScript] Support SizeVar Roundtripping
- [TVMScript] Sugar T.env_thread + T.launch_thread
- [TVMScript] Encourage using T.Buffer directly
- [TVMScript] Unify
T.handle
andT.Ptr
- [TVMScript] Enable Safe Autocasting in BufferStore
- [TVMScript] Deterministic function ordering
- [TVMScript][Fix] Print Multi-line String as Metadata
- [TVMScript] Use op attribute to control whether to print dtype in TVMScript
- [TVMScript] Upstream IRModule parser from unity
- [TVMScript] Upstream IRModule parser from unity
- [TVMScript] Upstream IRModule parser from unity
- [TVMScript] Improved error message for unexpected top frame
- [TVMScript] Use new variable frame in If/Then/Else
- [Bugfix][TVMScript] Preserve variable names in LetStmt
- [TVMScript] More accurate hints for ImportError
- [TVMScript,Fix] Fix findsource when classes are indented
- [TVMScript][Printer] Remove relax prefix for now
- [Fix][TVMScript] Fix index of metadata in printed script
- [TVMScript] Fix print round-tripable multi thread env binding
- [TVMScript][Parser] Add more warp-level builtins and
Range