Details
hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops (#20701)
Add element-wise unary ops needed by Qwen 3.5's DeltaNet linear
attention layers. These ops follow the existing unary-ops pattern
with VTCM DMA double-buffering.
- neg: negate via scale by -1.0
- exp: uses existing hvx_exp_f32 HVX intrinsics
- sigmoid: uses existing hvx_sigmoid_f32_aa HVX intrinsics
- softplus: log(1 + exp(x)) scalar fallback
- CONT reuses the existing CPY infrastructure since making a tensor
contiguous is equivalent to a same-type copy. - REPEAT implements tiled memory copy with multi-threaded execution via
the worker pool, supporting f32 and f16 types. The kernel parallelizes
across output rows and uses memcpy for each tile.
Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: