Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), TIR etc.
Please visit the full listing of commits for a complete view: v0.23.dev0...v0.23.0.rc0.
Community
None.
RFCs
None.
Adreno
- #18523 - [TEXTURE] Texture based lowering
Arith
- #18542 - Revert "Fix InternalError: Check failed: (eval_vec_) is false"
- #18536 - Fix InternalError: Check failed: (eval_vec_) is false
BugFix
- #18628 - [Fix] Fix typo in file header comment
- #18589 - [OpenCL] Guard QCOM perf hint behind USE_OPENCL_EXTN_QCOM to avoid undefined symbol on non-QCOM runtimes
- #18534 - Prevent segfault when instantiating abstract SearchStrategy
CI
- #18549 - Remove hardcoded user and repo values
- #18484 - Update file patterns for specific linting hooks
- #18470 - Enhance python linting scripts to support revision-based checks
- #18498 - Use glob for
conda/build-environment.yamlin cache key - #18495 - Update
actions/cacheto v4 in setup action - #18457 - Fix crash when grep finds no matches
- #18448 - Update pre-commit configuration
- #18432 - Enable username checks in PR title and body
- #18430 - [TEST][CODEGEN] Fix the test scripts tries to tell numpy a dtype name that it cannot recognise
- #18419 - [TEST] Refactor: remove the deprecated warning message check from test cases
Docs
- #18545 - Improve static shape tuning parameter configuration (follow-up to commit c71aefc)
- #18539 - Fix e2e_opt_model tutorial for GPU deployment
- #18451 - Update the merge setting
- #18436 - Remove prebuilt package references and disable Colab button at tutorials
- #18413 - Update cross-compilation and RPC tutorial with modern PyTorch deployment workflow
- #18412 - Update tutorial for exporting and loading back Relax executables
- #18404 - Add tutorial for exporting and loading back Relax executables
Frontend
- #18435 - [ONNX] Fix operator Transpose: TVMError: PermuteDims expects the number of input axes to equal the ndim of the input tensor
LLVM
- #18586 - [Codegen] Avoid segfault when
arith::GetVScaleValuesreturns empty vector
MetaSchedule
- #18547 - Fix tune_tir crash with ScheduleError in RewriteParallelVectorizeUnroll
Relax
- #18676 - Implement dynamic output trimming for NMS
- #18664 - Add FDataDependent operator attribute for LegalizeOps
- #18668 - [Onnx] Support Local Response Normalization (LRN)
- #18667 - Add native size operator
- #18675 - [LAYOUT] Support for dynamic layout specification
- #18652 - [ONNX] add support for unique optional outputs
- #18665 - Replace topi.take with relax.op.take
- #18663 - Fix wrong memory planning when only lower bound was provided
- #18666 - [Onnx][Resize] Handle non-4D input tensors
- #18658 - [Onnx][PReLU] Handle slope and axis argument with different slope shapes
- #18649 - Remove obsolete TODO comments
- #18642 - Add FRelaxInferLayout for gather_elements operator
- #18643 - Add FRelaxInferLayout for scatter_nd operator
- #18641 - [Op] Fixed incorrect output shape of Pool op when ceil_mode = true
- #18638 - Add FRelaxInferLayout for scatter_elements operator
- #18637 - Add FRelaxInferLayout for flip operator
- #18633 - Add FRelaxInferLayout and TMixedPrecisionPolicy for dynamic_strided_slice
- #18635 - [Onnx] Pass output_padding param in ConvTranspose
- #18632 - Move GetUsedVars to analysis module
- #18629 - Add FInferMixedPrecision and FRelaxInferLayout for conv transpose ops
- #18626 - [Op][PyTorch] Supported Median operator
- #18576 - Correct YaRN RoPE frequency scaling formula to align with the original paper
- #18615 - Add gpu-generic fallback for unrecognized GPU targets
- #18621 - Use weight shape instead of dim in Embedding.forward
- #18613 - Remove duplicated test case: test_if_branch_var_scope
- #18616 - Replaced call_pure_packed with tensor_to_shape operator
- #18593 - feat: Implement FRelaxInferLayout for tile operator
- #18618 - Add test case for op attributes in AST printer
- #18619 - [PyTorch] Fix PyTorch Dynamo frontend for Darwin compatibility
- #18575 - [ONNX] Add edge padding mode
- #18620 - Fix flaky test_conv2d gradient numeric test
- #18609 - Fix batch normalization computation logic
- #18574 - [Torch] AssertionError: Unsupported function types ['mean.default']
- #18591 - Chore: Fix the DeprecationWarning: invalid escape sequence \
- #18577 - Clean up scatter_elements unknown dtype handling
- #18579 - Add layout inference support for repeat operator
- #18583 - [Torch] Fixed issues related to sum op when without dim and keep dim
- #18554 - Enhance unique block name generation with numeric suffixes
- #18558 - Add edge padding mode
- #18559 - Add mod operator support
- #18544 - [PyTorch] Add support for Custom Ops for ExportedProgram frontend
- #18535 - [PyTorch] Add support for masked_select
- #18551 - [Frontend] Introduce ModuleDict
- #18550 - [PyTorch] Enhance scale_factor handling in interpolation
- #18553 - [PyTorch] Unify dtype used in conv2d tests
- #18548 - [PyTroch] Add NHWC layout support
- #18533 - [PyTorch] Fix index_put with broadcast indices
- #18521 - [PyTorch] Handle unknown output shapes for _sym_size_int
- #18532 - [PyTorch] Add support for bidirectional GRU
- #18530 - [PyTorch] Add boolean tensor support for max operation and corresponding test case
- #18524 - [PyTorch] Fix InternalError when converting scaled_dot_product_attention with 2D inputs
- #18527 - [PyTorch] Add support for non-persistent buffers in ExportedProgram frontend
- #18529 - [PyTorch] Add support for binary scalar operations in ExportedProgram frontend and corresponding tests
- #18522 - [PyTorch] Unify tests using shared tvm.testing.assert_allclose
- #18516 - [PyTorch] Add support for bidirectional LSTM
- #18499 - [PyTorch] Add support for sparse matrix multiplication
- #18518 - [PyTorch] Fix batch normalization training mode correctness
- #18517 - [PyTorch] Unify tests using shared verify_model
- #18506 - [PyTorch] Enhance data type handling in FX graph translator
- #18507 - [PyTorch] Support specifying decimals for _round
- #18500 - [PyTorch] Add support for antialiased bilinear upsampling
- #18489 - [PyTorch] Enhance handling of unbounded upper bound constraints
- #17599 - [PASS] Annotate Custom Scope layout pass for Adreno GPU
- #18497 - [PyTorch] Add binary operation dtype promotion following PyTorch rules in ExportedProgram frontend
- #18478 - Fix the squeeze operator to behave consistently with torch
- #18496 - [PyTorch] Add
muloperator in ExportedProgram frontend - #18494 - [PyTorch] Add negative slicing support in
slice_scatteroperation - #18493 - [PyTorch] Add broadcast support for
copyoperation - #18490 - [PyTorch] Add
as_stridedoperator in ExportedProgram frontend - #18487 - [PyTorch] Add
count_include_padsupport toavg_pool2din PyTorch frontend - #18488 - [PyTorch] Enhance index_put support for multi-dimensional indices
- #18486 - [PyTorch] Fix
batch_norm.defaultargs handling in ExportedProgram frontend - #18483 - [PyTorch] Add support for grid_sample operator
- #18482 - [PyTorch] Add support for gumbel_softmax
- #18485 - [PyTorch] Add dynamic shape support to
torch.ops.aten.sym_size.intin ExportedProgram frontend - #18473 - [PyTorch] Add support for
torch.ops.aten.sym_size.intin ExportedProgram frontend - #18471 - [PyTorch] Enable run_ep_decomposition by default
- #18462 - [PyTorch] Add decomposed operator support for interpolate
- #18455 - Fix flaky test_conv2d_offload by increasing float32 tolerance
- #18463 - [PyTorch] Support advanced range constraints (multiplication)
- #18464 - [PyTorch] Enable decomposition in all tests
- #18461 - [PyTorch] Fix KeyError: dtype when converting PyTorch model with gradient checkpointing using torch.export
- #18452 - [PyTorch] Support advanced range constraints (addition)
- #18454 - [PyTorch]: Fix the sqrt operation requires float dtype but receives int64 in attention scaling
- #18459 - [PyTorch] Fix MultiheadAttention complie
- #18460 - [PyTorch] Add decomposed operator support for normalization
- #18458 - [PyTorch] Add decomposed operator support for Binary
- #18449 - [PyTorch] Add decomposed operator support for Pad
- #18447 - [PyTorch] Add lower bound support for range constraints
- #18446 - [PyTorch] Add decomposed operator support for MaxPool
- #18437 - [PyTorch] Add decomposed operator support for AdaptiveAvgPool
- #18433 - [PyTorch] Add decomposed operator support for Conv
- #18429 - [PyTorch] Support basic range constraints
- #18428 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(8)
- #18427 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(7)
- #18420 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(6)
- #18417 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(5)
- #18416 - [ONNX] Fix bug: Unsupported numpy or ml_dtypes dtype('O') when importing ONNX model using Relax frontend
- #18414 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(4)
- #18410 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(3)
- #18403 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(2)
- #18402 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(1)
- #18401 - [PyTorch] Enable decomposition for unary ops and refactor tests
- #18400 - [PyTorch] Add support for decomposed operators in extended unary ops tests
- #18399 - [PyTorch] Add run_ep_decomposition flag to control PyTorch decomposition
Runtime
- #18546 - [MatchShape] Type error: Cannot convert from type ' DLTensor* ' to ' ffi.Shape '
TIR
- #18639 - [Schedule] Fix type checker to support subscripted generics in Python 3.14+
- #18515 - [Schedule] FuseReductionEpilogue: Add Clipping pattern support
- #18556 - [Schedule] Fix bug on bfloat16 conversion
- #18528 - [Schedule] Fix mma tensorize error
- #18514 - Fix tir.LowerIntrin check failed additional_info.size() == new_size
- #18505 - Update function signatures for decompose_reduction
- #18479 - : Fix VerifyStream::Verify causes dereferencing an invalid pointer
- #18421 - Add step attribute to ForNode (Initial codes)
- #18418 - [Schedule] Add FuseReductionEpilogue primitive to fuse epilogue …
- #18466 - Fix Data Type Mismatch (int64 vs int32) in T.match_buffer when Working with Scalar Buffers in TIR
TVMScript
- #18504 - Add test for TIR macro block name suffix handling
- #18465 - Add block name suffix management for TIR macros
cuda & cutlass & tensorrt
- #18624 - [CUDA] Fix cuModuleUnload crash during interpreter shutdown
- #18604 - [CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel
web
- #18683 - Fix RPC argument parsing for new FFI string/bytes types
- #18686 - Fix incorrect FFI export name in runtime.ts
- #18480 - Bump web runtime version 0.23.0-dev1
- #18467 - Replace string with TVMFFIByteArray* to avoid memory issues
- #18450 - Fix progress reporting when loading from cache
- #18415 - Fix arrayDecodeStorage scope issue for q0f32 models
- #18385 - Upgrade web runtime to new FFI
Misc
- #18681 - [NVRTC] Add NVSHMEM support to NVRTC compilation path
- #18674 - fix: MSVC pragma
- #18654 - [FFI] bump to latest version
- #18656 - Put options before objects when compiling
- #18519 - [Compile] accelerate compilation speed using NVRTC
- #18582 - Fix ACOS precision issue for boundary values (x=±1.0)
- #18557 - [Attn] Fix calling FlashInfer attention plan function
- #18555 - Fix duplicate
PresburgerSetNoderegistration whenUSE_MLIR=ONand MLIR >= 15.0 - #18525 - [Schedule] Fix LocalBuilder Check failed: (index_map_func.has_value()) is false
- #18511 - [Pass] Add DumpIR pass instrument to save IR snapshots
- #18512 - Remove unused TVMC configs
- #18509 - Fix compilation warnings
- #18492 - Fix BufferError when converting PyTorch models with sparse tensors
- #18469 - [Contrib] Update RandomFill to use StreamSync for CUDA synchronization
- #18453 - [DataType] Update to use explicit Bool Type Aligning with DLPack
- #18422 - Adjusted Longrope embedding function to match Huggingface Implementation
- #18426 - Support integer type input for log and log2
- #18411 - [FFI] Bump tvm-ffi to latest
- #18409 - Fixing database bug
- #18390 - Support integer types in TIR expression operators
- #18398 - fix the 8-bit vector loads/stores problem, which will solve the problem raised in the codegen test for cuda
- #18389 - Add VisitStmt_ method for AssertStmtNode and StringImmNode
- #18361 - [WebLLM] Replace int64s with int32s in WebGPU kernels
- #18384 - Fix crash when multiple PrimFunc objects are present in IRModule
- #18378 - [release][Dont Squash] Update version to 0.22.0 and 0.23.0.dev on main branch