ggml-org/llama.cpp b7340 on GitHub

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

metal: SSM kernel improvements (#17876)

feat: Add a batched version of ssm_conv

This was done using Claude Code. It found a number of optimizations around
how the threads were organized, resulting in a huge performance boost!

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

feat: Optimized SSM_SCAN kernel for metal

This used Claude Code and resulted in a modest performance improvement
while maintaining correctness.

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

test: Add test-backend-ops perf tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

test: Real representitive tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

refactor: Use function constant for ssm_conv batch size

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

test: backend op tests for ssm_scan from granite4 1b-h

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

style: remove commented out templates

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

feat: float4 version of ssm_conv_batched

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

fix: Add missing ggml_metal_cv_free

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows: