github ggml-org/llama.cpp b9110

latest release: b9112
2 hours ago
Details

docs: fix metrics endpoint description in server README (#22879)

  • docs: fix metrics endpoint description in server README

Required model query parameter for router mode described.

Removed metrics:

  • llamacpp:kv_cache_usage_ratio
  • llamacpp:kv_cache_tokens

Added metrics:

  • llamacpp:prompt_seconds_total
  • llamacpp:tokens_predicted_seconds_total
  • llamacpp:n_decode_total
  • llamacpp:n_busy_slots_per_decode
  • server: fix metrics type for n_busy_slots_per_decode metric

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.