github mostlygeek/llama-swap v141

latest releases: v157, v156, v155...
one month ago

This release changes the way tokens/second are calculated on the activities page. The previous method was inaccurate because it divided the number of tokens generated by the total request time. The total request time also included prompt processing so the number was too misleading to be useful.

This release changes the logic to:

  • use llama-server's timings record if it exists for tokens/second
  • send a -1 when timings is not available. The UI will render this as "unknown".

Supporting timing information for other inference engines will be future PRs.

Token/Second and duration now match llama-server's output precisely:

image

Changelog

Don't miss a new llama-swap release

NewReleases is sending notifications on new releases.