Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), TIR etc.

Please visit the full listing of commits for a complete view: v0.23.dev0...v0.23.0.rc0.

Community

None.

RFCs

None.

Adreno

#18523 - [TEXTURE] Texture based lowering

Arith

#18542 - Revert "Fix InternalError: Check failed: (eval_vec_) is false"
#18536 - Fix InternalError: Check failed: (eval_vec_) is false

BugFix

#18628 - [Fix] Fix typo in file header comment
#18589 - [OpenCL] Guard QCOM perf hint behind USE_OPENCL_EXTN_QCOM to avoid undefined symbol on non-QCOM runtimes
#18534 - Prevent segfault when instantiating abstract SearchStrategy

CI

#18549 - Remove hardcoded user and repo values
#18484 - Update file patterns for specific linting hooks
#18470 - Enhance python linting scripts to support revision-based checks
#18498 - Use glob for conda/build-environment.yaml in cache key
#18495 - Update actions/cache to v4 in setup action
#18457 - Fix crash when grep finds no matches
#18448 - Update pre-commit configuration
#18432 - Enable username checks in PR title and body
#18430 - [TEST][CODEGEN] Fix the test scripts tries to tell numpy a dtype name that it cannot recognise
#18419 - [TEST] Refactor: remove the deprecated warning message check from test cases

Docs

#18545 - Improve static shape tuning parameter configuration (follow-up to commit c71aefc)
#18539 - Fix e2e_opt_model tutorial for GPU deployment
#18451 - Update the merge setting
#18436 - Remove prebuilt package references and disable Colab button at tutorials
#18413 - Update cross-compilation and RPC tutorial with modern PyTorch deployment workflow
#18412 - Update tutorial for exporting and loading back Relax executables
#18404 - Add tutorial for exporting and loading back Relax executables

Frontend

#18435 - [ONNX] Fix operator Transpose: TVMError: PermuteDims expects the number of input axes to equal the ndim of the input tensor

LLVM

#18586 - [Codegen] Avoid segfault when arith::GetVScaleValues returns empty vector

MetaSchedule

#18547 - Fix tune_tir crash with ScheduleError in RewriteParallelVectorizeUnroll

Relax

#18676 - Implement dynamic output trimming for NMS
#18664 - Add FDataDependent operator attribute for LegalizeOps
#18668 - [Onnx] Support Local Response Normalization (LRN)
#18667 - Add native size operator
#18675 - [LAYOUT] Support for dynamic layout specification
#18652 - [ONNX] add support for unique optional outputs
#18665 - Replace topi.take with relax.op.take
#18663 - Fix wrong memory planning when only lower bound was provided
#18666 - [Onnx][Resize] Handle non-4D input tensors
#18658 - [Onnx][PReLU] Handle slope and axis argument with different slope shapes
#18649 - Remove obsolete TODO comments
#18642 - Add FRelaxInferLayout for gather_elements operator
#18643 - Add FRelaxInferLayout for scatter_nd operator
#18641 - [Op] Fixed incorrect output shape of Pool op when ceil_mode = true
#18638 - Add FRelaxInferLayout for scatter_elements operator
#18637 - Add FRelaxInferLayout for flip operator
#18633 - Add FRelaxInferLayout and TMixedPrecisionPolicy for dynamic_strided_slice
#18635 - [Onnx] Pass output_padding param in ConvTranspose
#18632 - Move GetUsedVars to analysis module
#18629 - Add FInferMixedPrecision and FRelaxInferLayout for conv transpose ops
#18626 - [Op][PyTorch] Supported Median operator
#18576 - Correct YaRN RoPE frequency scaling formula to align with the original paper
#18615 - Add gpu-generic fallback for unrecognized GPU targets
#18621 - Use weight shape instead of dim in Embedding.forward
#18613 - Remove duplicated test case: test_if_branch_var_scope
#18616 - Replaced call_pure_packed with tensor_to_shape operator
#18593 - feat: Implement FRelaxInferLayout for tile operator
#18618 - Add test case for op attributes in AST printer
#18619 - [PyTorch] Fix PyTorch Dynamo frontend for Darwin compatibility
#18575 - [ONNX] Add edge padding mode
#18620 - Fix flaky test_conv2d gradient numeric test
#18609 - Fix batch normalization computation logic
#18574 - [Torch] AssertionError: Unsupported function types ['mean.default']
#18591 - Chore: Fix the DeprecationWarning: invalid escape sequence \
#18577 - Clean up scatter_elements unknown dtype handling
#18579 - Add layout inference support for repeat operator
#18583 - [Torch] Fixed issues related to sum op when without dim and keep dim
#18554 - Enhance unique block name generation with numeric suffixes
#18558 - Add edge padding mode
#18559 - Add mod operator support
#18544 - [PyTorch] Add support for Custom Ops for ExportedProgram frontend
#18535 - [PyTorch] Add support for masked_select
#18551 - [Frontend] Introduce ModuleDict
#18550 - [PyTorch] Enhance scale_factor handling in interpolation
#18553 - [PyTorch] Unify dtype used in conv2d tests
#18548 - [PyTroch] Add NHWC layout support
#18533 - [PyTorch] Fix index_put with broadcast indices
#18521 - [PyTorch] Handle unknown output shapes for _sym_size_int
#18532 - [PyTorch] Add support for bidirectional GRU
#18530 - [PyTorch] Add boolean tensor support for max operation and corresponding test case
#18524 - [PyTorch] Fix InternalError when converting scaled_dot_product_attention with 2D inputs
#18527 - [PyTorch] Add support for non-persistent buffers in ExportedProgram frontend
#18529 - [PyTorch] Add support for binary scalar operations in ExportedProgram frontend and corresponding tests
#18522 - [PyTorch] Unify tests using shared tvm.testing.assert_allclose
#18516 - [PyTorch] Add support for bidirectional LSTM
#18499 - [PyTorch] Add support for sparse matrix multiplication
#18518 - [PyTorch] Fix batch normalization training mode correctness
#18517 - [PyTorch] Unify tests using shared verify_model
#18506 - [PyTorch] Enhance data type handling in FX graph translator
#18507 - [PyTorch] Support specifying decimals for _round
#18500 - [PyTorch] Add support for antialiased bilinear upsampling
#18489 - [PyTorch] Enhance handling of unbounded upper bound constraints
#17599 - [PASS] Annotate Custom Scope layout pass for Adreno GPU
#18497 - [PyTorch] Add binary operation dtype promotion following PyTorch rules in ExportedProgram frontend
#18478 - Fix the squeeze operator to behave consistently with torch
#18496 - [PyTorch] Add mul operator in ExportedProgram frontend
#18494 - [PyTorch] Add negative slicing support in slice_scatter operation
#18493 - [PyTorch] Add broadcast support for copy operation
#18490 - [PyTorch] Add as_strided operator in ExportedProgram frontend
#18487 - [PyTorch] Add count_include_pad support to avg_pool2d in PyTorch frontend
#18488 - [PyTorch] Enhance index_put support for multi-dimensional indices
#18486 - [PyTorch] Fix batch_norm.default args handling in ExportedProgram frontend
#18483 - [PyTorch] Add support for grid_sample operator
#18482 - [PyTorch] Add support for gumbel_softmax
#18485 - [PyTorch] Add dynamic shape support to torch.ops.aten.sym_size.int in ExportedProgram frontend
#18473 - [PyTorch] Add support for torch.ops.aten.sym_size.int in ExportedProgram frontend
#18471 - [PyTorch] Enable run_ep_decomposition by default
#18462 - [PyTorch] Add decomposed operator support for interpolate
#18455 - Fix flaky test_conv2d_offload by increasing float32 tolerance
#18463 - [PyTorch] Support advanced range constraints (multiplication)
#18464 - [PyTorch] Enable decomposition in all tests
#18461 - [PyTorch] Fix KeyError: dtype when converting PyTorch model with gradient checkpointing using torch.export
#18452 - [PyTorch] Support advanced range constraints (addition)
#18454 - [PyTorch]: Fix the sqrt operation requires float dtype but receives int64 in attention scaling
#18459 - [PyTorch] Fix MultiheadAttention complie
#18460 - [PyTorch] Add decomposed operator support for normalization
#18458 - [PyTorch] Add decomposed operator support for Binary
#18449 - [PyTorch] Add decomposed operator support for Pad
#18447 - [PyTorch] Add lower bound support for range constraints
#18446 - [PyTorch] Add decomposed operator support for MaxPool
#18437 - [PyTorch] Add decomposed operator support for AdaptiveAvgPool
#18433 - [PyTorch] Add decomposed operator support for Conv
#18429 - [PyTorch] Support basic range constraints
#18428 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(8)
#18427 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(7)
#18420 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(6)
#18417 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(5)
#18416 - [ONNX] Fix bug: Unsupported numpy or ml_dtypes dtype('O') when importing ONNX model using Relax frontend
#18414 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(4)
#18410 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(3)
#18403 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(2)
#18402 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(1)
#18401 - [PyTorch] Enable decomposition for unary ops and refactor tests
#18400 - [PyTorch] Add support for decomposed operators in extended unary ops tests
#18399 - [PyTorch] Add run_ep_decomposition flag to control PyTorch decomposition

Runtime

#18546 - [MatchShape] Type error: Cannot convert from type ' DLTensor* ' to ' ffi.Shape '

TIR

#18639 - [Schedule] Fix type checker to support subscripted generics in Python 3.14+
#18515 - [Schedule] FuseReductionEpilogue: Add Clipping pattern support
#18556 - [Schedule] Fix bug on bfloat16 conversion
#18528 - [Schedule] Fix mma tensorize error
#18514 - Fix tir.LowerIntrin check failed additional_info.size() == new_size
#18505 - Update function signatures for decompose_reduction
#18479 - : Fix VerifyStream::Verify causes dereferencing an invalid pointer
#18421 - Add step attribute to ForNode (Initial codes)
#18418 - [Schedule] Add FuseReductionEpilogue primitive to fuse epilogue …
#18466 - Fix Data Type Mismatch (int64 vs int32) in T.match_buffer when Working with Scalar Buffers in TIR

TVMScript

#18504 - Add test for TIR macro block name suffix handling
#18465 - Add block name suffix management for TIR macros

cuda & cutlass & tensorrt

#18624 - [CUDA] Fix cuModuleUnload crash during interpreter shutdown
#18604 - [CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel

web

#18683 - Fix RPC argument parsing for new FFI string/bytes types
#18686 - Fix incorrect FFI export name in runtime.ts
#18480 - Bump web runtime version 0.23.0-dev1
#18467 - Replace string with TVMFFIByteArray* to avoid memory issues
#18450 - Fix progress reporting when loading from cache
#18415 - Fix arrayDecodeStorage scope issue for q0f32 models
#18385 - Upgrade web runtime to new FFI

Misc

#18681 - [NVRTC] Add NVSHMEM support to NVRTC compilation path
#18674 - fix: MSVC pragma
#18654 - [FFI] bump to latest version
#18656 - Put options before objects when compiling
#18519 - [Compile] accelerate compilation speed using NVRTC
#18582 - Fix ACOS precision issue for boundary values (x=±1.0)
#18557 - [Attn] Fix calling FlashInfer attention plan function
#18555 - Fix duplicate PresburgerSetNode registration when USE_MLIR=ON and MLIR >= 15.0
#18525 - [Schedule] Fix LocalBuilder Check failed: (index_map_func.has_value()) is false
#18511 - [Pass] Add DumpIR pass instrument to save IR snapshots
#18512 - Remove unused TVMC configs
#18509 - Fix compilation warnings
#18492 - Fix BufferError when converting PyTorch models with sparse tensors
#18469 - [Contrib] Update RandomFill to use StreamSync for CUDA synchronization
#18453 - [DataType] Update to use explicit Bool Type Aligning with DLPack
#18422 - Adjusted Longrope embedding function to match Huggingface Implementation
#18426 - Support integer type input for log and log2
#18411 - [FFI] Bump tvm-ffi to latest
#18409 - Fixing database bug
#18390 - Support integer types in TIR expression operators
#18398 - fix the 8-bit vector loads/stores problem, which will solve the problem raised in the codegen test for cuda
#18389 - Add VisitStmt_ method for AssertStmtNode and StringImmNode
#18361 - [WebLLM] Replace int64s with int32s in WebGPU kernels
#18384 - Fix crash when multiple PrimFunc objects are present in IRModule
#18378 - [release][Dont Squash] Update version to 0.22.0 and 0.23.0.dev on main branch

apache/tvm v0.23.0.rc0 Apache TVM v0.23.0 on GitHub