github ggml-org/llama.cpp b8400

4 hours ago
Details

hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops (#20701)

Add element-wise unary ops needed by Qwen 3.5's DeltaNet linear
attention layers. These ops follow the existing unary-ops pattern
with VTCM DMA double-buffering.

  • neg: negate via scale by -1.0
  • exp: uses existing hvx_exp_f32 HVX intrinsics
  • sigmoid: uses existing hvx_sigmoid_f32_aa HVX intrinsics
  • softplus: log(1 + exp(x)) scalar fallback
  • CONT reuses the existing CPY infrastructure since making a tensor
    contiguous is equivalent to a same-type copy.
  • REPEAT implements tiled memory copy with multi-threaded execution via
    the worker pool, supporting f32 and f16 types. The kernel parallelizes
    across output rows and uses memcpy for each tile.

Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.