What's Changed
- Fix APC splitting on multimodal inputs by @lucasnewman in #1311
- Fix Qwen quantized KV cache prompt state by @Blaizzy in #1313
- Fix Gemma 4 unified silently dropping image/video inputs (video processor rejects standard HF config keys) by @sockeye44 in #1321
- Fix thinking enabled by default for chat templates by @lucasnewman in #1316
- Fix APC exact disk-hit entries not promoted to in-memory LRU by @mikeatlas in #1324
- Align Qwen3-VL visual masks to the current prefill / decode window by @lucasnewman in #1325
- Fix LFM2.5 VL model loading by @lucasnewman in #1328
- Fix Phi 3.5 VL eos tokens by @lucasnewman in #1326
- Fix Qwen3-VL deepstack visual embeds misaligned during chunked prefill by @sockeye44 in #1332
- Fix --system prompt ignored by one-shot generate by @sockeye44 in #1330
- Improve speculative prefill when using MTP drafters by @lucasnewman in #1334
- Respect generation_config sampling defaults by @Blaizzy in #1337
- Add server thinking defaults by @Blaizzy in #1342
- Add Gemma 4 DLM by @Blaizzy in #1347
- Fix DiffusionGemma long-context prefill by @Blaizzy in #1348
New Contributors
- @sockeye44 made their first contribution in #1321
- @mikeatlas made their first contribution in #1324
Full Changelog: v0.6.2...v0.6.3