github ggml-org/llama.cpp b9045

2 hours ago
Details

mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101)

  • mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech)

Conformer encoder with Shaw relative position encoding,
QFormer projector, log-mel spectrogram with frame stacking.

Encoder uses GLU gating, folded batch norm, and SSM depthwise
conv. QFormer compresses encoder output via windowed
cross-attention (window=15, queries=3) into the LLM embedding
space.

Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank,
dynamic range compression, 2x frame stacking (80->160 mel).

GGUF converter handles batch norm folding at export time,
fused K/V split, and Conv1d weight reshaping.

Tested against HF transformers reference: token-for-token match
on 30s/60s audio clips with greedy decoding.

  • mtmd: rename gs_ prefixed tensors to generic/architecture names

  • mtmd: use tensor_mapping.py for all granite_speech tensors

  • convert: fold GraniteSpeechTextModel into GraniteModel

  • mtmd: replace n_layer hack with explicit has_standard_layers flag

  • mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech

  • mtmd: align KEY_A_ define spacing

  • convert: register GraniteModel for GraniteSpeechForConditionalGeneration

  • convert: fix ty type-check for GraniteSpeechMmprojModel registration

  • mtmd: align TN_ define spacing

  • mtmd: use generic layer loop for granite speech tensor loading

  • mtmd: merge qformer_proj_layer into clip_layer

  • mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs

  • mtmd: granite_speech add comment explaining why build_attn is not used

  • mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata

  • gguf: add spacing between granite_speech tensor mapping blocks

  • mtmd: make generic audio layer_norm_eps read optional

  • mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps

  • mtmd: align defines and struct fields in clip-impl.h and clip-model.h

  • mtmd: fix alignment and ordering issues across granite speech files

  • convert: granite_speech use filter_tensors instead of modify_tensors for skipping

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.