Headline
- Lemonade can now load multiple models at the same time and supports parallel requests.
- You control how many models can be loaded with
--max-loaded-models(default: 1 LLM, 1 embedding, 1 reranking). - Demo by opening
examples\demos\multi-model-tester.htmlin your browser!
What's Changed
- Reduce log spam during long model loads on linux by @jeremyfowers in #634
- Update model docs for v9.0.5 by @jeremyfowers in #637
- Support for multiple models loaded at once by @jeremyfowers in #592
- Improve venv test style by @jeremyfowers in #638
- Add fault tolerance to system-info by @jeremyfowers in #639
- Fix up the models endpoint code and spec by @jeremyfowers in #643
- Bug Fix: Allow run command to work when server is already running by @danielholanda in #642
- Selective tray unloading by @danielholanda in #641
Full Changelog: v9.0.5...v9.0.6