github ggml-org/llama.cpp b7600

latest release: b7601
7 hours ago
Details

vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295)

  • vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron

Also handle GGML_OP_SCALE at the end (nemotron, deepseek2).

Fewer pipeline variants and spec constants, just use push constants.

In test_topk_moe, change exp_probs_b to be 1D, matching real networks.

Update test-backend-ops and ggml-backend to allow verifying multiple outputs
in a fusion test (topk_moe has two outputs). Previously only the final node
was verified.

  • change test_topk_moe to allow results in arbitrary order

  • disable sigmoid fusion for moltenvk

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.