Headline
EleutherAI's lm-evaluation-harness is now fully integrated as an automated Lemonade CLI tool (@ramkrishna2910)
- Run a huge variety of industry-standard LLM accuracy and evaluation tests with a single
lemonadecommand. - All results are collected into the Lemonade Cache for post-processing with
lemonade report - Doc: https://github.com/lemonade-sdk/lemonade/blob/main/docs/lm-eval.md
Improvements
Tool calling in Lemonade Server (@danielholanda):
- Streaming tool calling is now supported for GGUF models.
- Added
Llama-xLAM-2-8b-fc-r-Hybrid, a SOTA model for tool calling, to our list of suggested Hybrid models.
Fixes
Lemonade Server fixes:
- Fixed a bug where GGUF models would fail to load if port 8081 was occupied (@jeremyfowers)
- Removed LLaMA 3.1 1B and 3B CPU models from the suggested models list because they didn't work in some environments (@jeremyfowers)
- Fixed access to Qwen3-8B-GGUF (@danielholanda)
Full Changelog: v7.0.1...v7.0.2