Details
hexagon refactor all Ops to use local context struct (#19819)
-
hexagon: refactor set/get/sum-rows ops to use local context
-
hexagon: refactor ROPE and Softmax Ops to use local context
Improves performance a bit by precomputing things and saving in the context.
-
hexagon: refactor activation ops to use local context struct
-
hexagon: refactor unary ops to use local context struct and DMA/VTCM
-
hexagon: use aligned hvx_scale function
-
hexagon: remove unused fields from op_context
-
hexagon: rewrite ROPE to use DMA and VTCM scratchpad
-
hex-rope: keep N rows in scratchpad (instead of just two)
-
hex-rope: introduce rowidx cache
-
hex-rope: remove unused fields
-
hex-rope: rewrite dma prefetch logic to allow for multi-row fetch/compute
also removes the need for fastdiv.
-
hex-rope: minor formatting
-
hex-rope: use indices and unroll the loops
-
hex-rope: more updates to cleanup rope-block handling
-
hexagon: cleanup supported type/dims checks
-
hexagon: all reduce funcs replicated across lanes
There is no need to explicitly replicate the first value.
- snapdragon: update adb and windows scripts to use ubatch-size 256
Updated Ops support handles larger ubatches.
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: