Highlight: In-Memory Hot Caching
Introducing an in-memory hot cache tier for KV cache blocks. Frequently accessed blocks stay in RAM for faster access, and SSD storage is used only when the hot cache reaches its capacity limit.
Configure it via --hot-cache-max-size CLI option or the admin web UI slider under Resource Management.
What's changed
- feat: Add in-memory hot cache with write-back mode (#58)
- fix: Merge consecutive same-role messages to prevent 500 error (#53)
- fix: Include
forced_ct_kwargsin model list API response - fix: Update outdated test assertions for SchedulerConfig defaults and CORS middleware
- ui: Improve global settings host selector and move batching to advanced
- chore: Change default
top_kfrom 40 to 0
Full changelog: v0.1.12...v0.1.13
