jundot/omlx v0.1.13 on GitHub

Highlight: In-Memory Hot Caching

Introducing an in-memory hot cache tier for KV cache blocks. Frequently accessed blocks stay in RAM for faster access, and SSD storage is used only when the hot cache reaches its capacity limit.

Configure it via --hot-cache-max-size CLI option or the admin web UI slider under Resource Management.

What's changed

feat: Add in-memory hot cache with write-back mode (#58)
fix: Merge consecutive same-role messages to prevent 500 error (#53)
fix: Include forced_ct_kwargs in model list API response
fix: Update outdated test assertions for SchedulerConfig defaults and CORS middleware
ui: Improve global settings host selector and move batching to advanced
chore: Change default top_k from 40 to 0

Full changelog: v0.1.12...v0.1.13