github ggml-org/llama.cpp b7331

latest releases: b7342, b7340, b7339...
18 hours ago

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

CANN: add support for partial RoPE and Vision mode (#17543)

  • cann: add support for partial RoPE and Vision mode

Add support for two important RoPE variants: partial rotation (rope_dims < ne0)
and Vision mode rotation.

  1. Support for partial RoPE (rope_dims < ne0):

    • Split tensor into head (first rope_dims dimensions) and tail portions
    • Apply rotation only to head portion using RotaryPositionEmbedding operator
    • Copy unrotated tail portion directly from source to destination
    • Handle both contiguous and non-contiguous tensor layouts
  2. Support for Vision mode (GGML_ROPE_TYPE_VISION):

    • Set rope_dims = ne0 for Vision mode to rotate entire tensor
    • Vision mode pairs dimension i with dimension i+n_dims (where n_dims = ne0/2)
    • No tail handling needed since entire tensor is rotated

Implementation details:

  • Use has_tail flag to determine execution path: head/tail splitting when
    rope_dims < ne0, or full tensor rotation when rope_dims == ne0
  • Support both F32 and F16 data types with intermediate F32 conversion
  • Copy non-contiguous tensors to contiguous buffers before calling
    RotaryPositionEmbedding operator for compatibility
  • Improve cache invalidation logic to include rope_dims and indep_sects
    parameters

These enhancements enable CANN backend to handle various RoPE configurations
used in modern vision-language models and models with partial rotation.

  • cann: fix review comment

macOS/iOS:

Linux:

Windows:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.