github mostlygeek/llama-swap v219
v219 (fixes v218)

21 hours ago

Notes

Including details for v218 (broken) and PR #790.

llama-swap has a new routing backend. What started as a small experiment to improve the concurrency handling exploded into a full refactor of the backend. For users this the biggest change is swapping is more efficient. Requests are collated so requests for models that are already loaded will take precedence over those that awaiting loading.

It looks like:

new router: A B A B A B -> A A A B B B
old router: A B A B A B -> A B A B A B 

However, just doing that wouldn't require a 12,009 line PR. There were a lot of architectural changes that makes developer quality of life a bit easier. Redundant code was removed, repo organization is centralized around the internal/ packages, new funny loading remarks were added, etc.

Also a new concurrency tester sneaked in under cmd/concurrency-tester.

image

Changelog

  • 4ca9c47 Makefile,internal/server: various release tweaks
  • 146a9ea ui-svelte: update build directory (#801)

Don't miss a new llama-swap release

NewReleases is sending notifications on new releases.