github apache/tvm v0.23.0.rc0
Apache TVM v0.23.0

pre-release6 hours ago

Introduction

The TVM community has worked since the last release to deliver the following new exciting improvements!

The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), TIR etc.

Please visit the full listing of commits for a complete view: v0.23.dev0...v0.23.0.rc0.

Community

None.

RFCs

None.

Adreno

  • #18523 - [TEXTURE] Texture based lowering

Arith

  • #18542 - Revert "Fix InternalError: Check failed: (eval_vec_) is false"
  • #18536 - Fix InternalError: Check failed: (eval_vec_) is false

BugFix

  • #18628 - [Fix] Fix typo in file header comment
  • #18589 - [OpenCL] Guard QCOM perf hint behind USE_OPENCL_EXTN_QCOM to avoid undefined symbol on non-QCOM runtimes
  • #18534 - Prevent segfault when instantiating abstract SearchStrategy

CI

  • #18549 - Remove hardcoded user and repo values
  • #18484 - Update file patterns for specific linting hooks
  • #18470 - Enhance python linting scripts to support revision-based checks
  • #18498 - Use glob for conda/build-environment.yaml in cache key
  • #18495 - Update actions/cache to v4 in setup action
  • #18457 - Fix crash when grep finds no matches
  • #18448 - Update pre-commit configuration
  • #18432 - Enable username checks in PR title and body
  • #18430 - [TEST][CODEGEN] Fix the test scripts tries to tell numpy a dtype name that it cannot recognise
  • #18419 - [TEST] Refactor: remove the deprecated warning message check from test cases

Docs

  • #18545 - Improve static shape tuning parameter configuration (follow-up to commit c71aefc)
  • #18539 - Fix e2e_opt_model tutorial for GPU deployment
  • #18451 - Update the merge setting
  • #18436 - Remove prebuilt package references and disable Colab button at tutorials
  • #18413 - Update cross-compilation and RPC tutorial with modern PyTorch deployment workflow
  • #18412 - Update tutorial for exporting and loading back Relax executables
  • #18404 - Add tutorial for exporting and loading back Relax executables

Frontend

  • #18435 - [ONNX] Fix operator Transpose: TVMError: PermuteDims expects the number of input axes to equal the ndim of the input tensor

LLVM

  • #18586 - [Codegen] Avoid segfault when arith::GetVScaleValues returns empty vector

MetaSchedule

  • #18547 - Fix tune_tir crash with ScheduleError in RewriteParallelVectorizeUnroll

Relax

  • #18676 - Implement dynamic output trimming for NMS
  • #18664 - Add FDataDependent operator attribute for LegalizeOps
  • #18668 - [Onnx] Support Local Response Normalization (LRN)
  • #18667 - Add native size operator
  • #18675 - [LAYOUT] Support for dynamic layout specification
  • #18652 - [ONNX] add support for unique optional outputs
  • #18665 - Replace topi.take with relax.op.take
  • #18663 - Fix wrong memory planning when only lower bound was provided
  • #18666 - [Onnx][Resize] Handle non-4D input tensors
  • #18658 - [Onnx][PReLU] Handle slope and axis argument with different slope shapes
  • #18649 - Remove obsolete TODO comments
  • #18642 - Add FRelaxInferLayout for gather_elements operator
  • #18643 - Add FRelaxInferLayout for scatter_nd operator
  • #18641 - [Op] Fixed incorrect output shape of Pool op when ceil_mode = true
  • #18638 - Add FRelaxInferLayout for scatter_elements operator
  • #18637 - Add FRelaxInferLayout for flip operator
  • #18633 - Add FRelaxInferLayout and TMixedPrecisionPolicy for dynamic_strided_slice
  • #18635 - [Onnx] Pass output_padding param in ConvTranspose
  • #18632 - Move GetUsedVars to analysis module
  • #18629 - Add FInferMixedPrecision and FRelaxInferLayout for conv transpose ops
  • #18626 - [Op][PyTorch] Supported Median operator
  • #18576 - Correct YaRN RoPE frequency scaling formula to align with the original paper
  • #18615 - Add gpu-generic fallback for unrecognized GPU targets
  • #18621 - Use weight shape instead of dim in Embedding.forward
  • #18613 - Remove duplicated test case: test_if_branch_var_scope
  • #18616 - Replaced call_pure_packed with tensor_to_shape operator
  • #18593 - feat: Implement FRelaxInferLayout for tile operator
  • #18618 - Add test case for op attributes in AST printer
  • #18619 - [PyTorch] Fix PyTorch Dynamo frontend for Darwin compatibility
  • #18575 - [ONNX] Add edge padding mode
  • #18620 - Fix flaky test_conv2d gradient numeric test
  • #18609 - Fix batch normalization computation logic
  • #18574 - [Torch] AssertionError: Unsupported function types ['mean.default']
  • #18591 - Chore: Fix the DeprecationWarning: invalid escape sequence \
  • #18577 - Clean up scatter_elements unknown dtype handling
  • #18579 - Add layout inference support for repeat operator
  • #18583 - [Torch] Fixed issues related to sum op when without dim and keep dim
  • #18554 - Enhance unique block name generation with numeric suffixes
  • #18558 - Add edge padding mode
  • #18559 - Add mod operator support
  • #18544 - [PyTorch] Add support for Custom Ops for ExportedProgram frontend
  • #18535 - [PyTorch] Add support for masked_select
  • #18551 - [Frontend] Introduce ModuleDict
  • #18550 - [PyTorch] Enhance scale_factor handling in interpolation
  • #18553 - [PyTorch] Unify dtype used in conv2d tests
  • #18548 - [PyTroch] Add NHWC layout support
  • #18533 - [PyTorch] Fix index_put with broadcast indices
  • #18521 - [PyTorch] Handle unknown output shapes for _sym_size_int
  • #18532 - [PyTorch] Add support for bidirectional GRU
  • #18530 - [PyTorch] Add boolean tensor support for max operation and corresponding test case
  • #18524 - [PyTorch] Fix InternalError when converting scaled_dot_product_attention with 2D inputs
  • #18527 - [PyTorch] Add support for non-persistent buffers in ExportedProgram frontend
  • #18529 - [PyTorch] Add support for binary scalar operations in ExportedProgram frontend and corresponding tests
  • #18522 - [PyTorch] Unify tests using shared tvm.testing.assert_allclose
  • #18516 - [PyTorch] Add support for bidirectional LSTM
  • #18499 - [PyTorch] Add support for sparse matrix multiplication
  • #18518 - [PyTorch] Fix batch normalization training mode correctness
  • #18517 - [PyTorch] Unify tests using shared verify_model
  • #18506 - [PyTorch] Enhance data type handling in FX graph translator
  • #18507 - [PyTorch] Support specifying decimals for _round
  • #18500 - [PyTorch] Add support for antialiased bilinear upsampling
  • #18489 - [PyTorch] Enhance handling of unbounded upper bound constraints
  • #17599 - [PASS] Annotate Custom Scope layout pass for Adreno GPU
  • #18497 - [PyTorch] Add binary operation dtype promotion following PyTorch rules in ExportedProgram frontend
  • #18478 - Fix the squeeze operator to behave consistently with torch
  • #18496 - [PyTorch] Add mul operator in ExportedProgram frontend
  • #18494 - [PyTorch] Add negative slicing support in slice_scatter operation
  • #18493 - [PyTorch] Add broadcast support for copy operation
  • #18490 - [PyTorch] Add as_strided operator in ExportedProgram frontend
  • #18487 - [PyTorch] Add count_include_pad support to avg_pool2d in PyTorch frontend
  • #18488 - [PyTorch] Enhance index_put support for multi-dimensional indices
  • #18486 - [PyTorch] Fix batch_norm.default args handling in ExportedProgram frontend
  • #18483 - [PyTorch] Add support for grid_sample operator
  • #18482 - [PyTorch] Add support for gumbel_softmax
  • #18485 - [PyTorch] Add dynamic shape support to torch.ops.aten.sym_size.int in ExportedProgram frontend
  • #18473 - [PyTorch] Add support for torch.ops.aten.sym_size.int in ExportedProgram frontend
  • #18471 - [PyTorch] Enable run_ep_decomposition by default
  • #18462 - [PyTorch] Add decomposed operator support for interpolate
  • #18455 - Fix flaky test_conv2d_offload by increasing float32 tolerance
  • #18463 - [PyTorch] Support advanced range constraints (multiplication)
  • #18464 - [PyTorch] Enable decomposition in all tests
  • #18461 - [PyTorch] Fix KeyError: dtype when converting PyTorch model with gradient checkpointing using torch.export
  • #18452 - [PyTorch] Support advanced range constraints (addition)
  • #18454 - [PyTorch]: Fix the sqrt operation requires float dtype but receives int64 in attention scaling
  • #18459 - [PyTorch] Fix MultiheadAttention complie
  • #18460 - [PyTorch] Add decomposed operator support for normalization
  • #18458 - [PyTorch] Add decomposed operator support for Binary
  • #18449 - [PyTorch] Add decomposed operator support for Pad
  • #18447 - [PyTorch] Add lower bound support for range constraints
  • #18446 - [PyTorch] Add decomposed operator support for MaxPool
  • #18437 - [PyTorch] Add decomposed operator support for AdaptiveAvgPool
  • #18433 - [PyTorch] Add decomposed operator support for Conv
  • #18429 - [PyTorch] Support basic range constraints
  • #18428 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(8)
  • #18427 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(7)
  • #18420 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(6)
  • #18417 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(5)
  • #18416 - [ONNX] Fix bug: Unsupported numpy or ml_dtypes dtype('O') when importing ONNX model using Relax frontend
  • #18414 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(4)
  • #18410 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(3)
  • #18403 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(2)
  • #18402 - [PyTorch] Add support for decomposed operators and fix IR of ops tests(1)
  • #18401 - [PyTorch] Enable decomposition for unary ops and refactor tests
  • #18400 - [PyTorch] Add support for decomposed operators in extended unary ops tests
  • #18399 - [PyTorch] Add run_ep_decomposition flag to control PyTorch decomposition

Runtime

  • #18546 - [MatchShape] Type error: Cannot convert from type ' DLTensor* ' to ' ffi.Shape '

TIR

  • #18639 - [Schedule] Fix type checker to support subscripted generics in Python 3.14+
  • #18515 - [Schedule] FuseReductionEpilogue: Add Clipping pattern support
  • #18556 - [Schedule] Fix bug on bfloat16 conversion
  • #18528 - [Schedule] Fix mma tensorize error
  • #18514 - Fix tir.LowerIntrin check failed additional_info.size() == new_size
  • #18505 - Update function signatures for decompose_reduction
  • #18479 - : Fix VerifyStream::Verify causes dereferencing an invalid pointer
  • #18421 - Add step attribute to ForNode (Initial codes)
  • #18418 - [Schedule] Add FuseReductionEpilogue primitive to fuse epilogue …
  • #18466 - Fix Data Type Mismatch (int64 vs int32) in T.match_buffer when Working with Scalar Buffers in TIR

TVMScript

  • #18504 - Add test for TIR macro block name suffix handling
  • #18465 - Add block name suffix management for TIR macros

cuda & cutlass & tensorrt

  • #18624 - [CUDA] Fix cuModuleUnload crash during interpreter shutdown
  • #18604 - [CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel

web

  • #18683 - Fix RPC argument parsing for new FFI string/bytes types
  • #18686 - Fix incorrect FFI export name in runtime.ts
  • #18480 - Bump web runtime version 0.23.0-dev1
  • #18467 - Replace string with TVMFFIByteArray* to avoid memory issues
  • #18450 - Fix progress reporting when loading from cache
  • #18415 - Fix arrayDecodeStorage scope issue for q0f32 models
  • #18385 - Upgrade web runtime to new FFI

Misc

  • #18681 - [NVRTC] Add NVSHMEM support to NVRTC compilation path
  • #18674 - fix: MSVC pragma
  • #18654 - [FFI] bump to latest version
  • #18656 - Put options before objects when compiling
  • #18519 - [Compile] accelerate compilation speed using NVRTC
  • #18582 - Fix ACOS precision issue for boundary values (x=±1.0)
  • #18557 - [Attn] Fix calling FlashInfer attention plan function
  • #18555 - Fix duplicate PresburgerSetNode registration when USE_MLIR=ON and MLIR >= 15.0
  • #18525 - [Schedule] Fix LocalBuilder Check failed: (index_map_func.has_value()) is false
  • #18511 - [Pass] Add DumpIR pass instrument to save IR snapshots
  • #18512 - Remove unused TVMC configs
  • #18509 - Fix compilation warnings
  • #18492 - Fix BufferError when converting PyTorch models with sparse tensors
  • #18469 - [Contrib] Update RandomFill to use StreamSync for CUDA synchronization
  • #18453 - [DataType] Update to use explicit Bool Type Aligning with DLPack
  • #18422 - Adjusted Longrope embedding function to match Huggingface Implementation
  • #18426 - Support integer type input for log and log2
  • #18411 - [FFI] Bump tvm-ffi to latest
  • #18409 - Fixing database bug
  • #18390 - Support integer types in TIR expression operators
  • #18398 - fix the 8-bit vector loads/stores problem, which will solve the problem raised in the codegen test for cuda
  • #18389 - Add VisitStmt_ method for AssertStmtNode and StringImmNode
  • #18361 - [WebLLM] Replace int64s with int32s in WebGPU kernels
  • #18384 - Fix crash when multiple PrimFunc objects are present in IRModule
  • #18378 - [release][Dont Squash] Update version to 0.22.0 and 0.23.0.dev on main branch

Don't miss a new tvm release

NewReleases is sending notifications on new releases.