github ggml-org/llama.cpp b7340

latest release: b7342
10 hours ago

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

metal: SSM kernel improvements (#17876)

  • feat: Add a batched version of ssm_conv

This was done using Claude Code. It found a number of optimizations around
how the threads were organized, resulting in a huge performance boost!

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: Optimized SSM_SCAN kernel for metal

This used Claude Code and resulted in a modest performance improvement
while maintaining correctness.

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • test: Add test-backend-ops perf tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • test: Real representitive tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • refactor: Use function constant for ssm_conv batch size

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • test: backend op tests for ssm_scan from granite4 1b-h

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • style: remove commented out templates

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: float4 version of ssm_conv_batched

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • fix: Add missing ggml_metal_cv_free

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

Co-authored-by: Georgi Gerganov ggerganov@gmail.com


Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.