Blaizzy/mlx-vlm v0.6.3 on GitHub

What's Changed

Fix APC splitting on multimodal inputs by @lucasnewman in #1311
Fix Qwen quantized KV cache prompt state by @Blaizzy in #1313
Fix Gemma 4 unified silently dropping image/video inputs (video processor rejects standard HF config keys) by @sockeye44 in #1321
Fix thinking enabled by default for chat templates by @lucasnewman in #1316
Fix APC exact disk-hit entries not promoted to in-memory LRU by @mikeatlas in #1324
Align Qwen3-VL visual masks to the current prefill / decode window by @lucasnewman in #1325
Fix LFM2.5 VL model loading by @lucasnewman in #1328
Fix Phi 3.5 VL eos tokens by @lucasnewman in #1326
Fix Qwen3-VL deepstack visual embeds misaligned during chunked prefill by @sockeye44 in #1332
Fix --system prompt ignored by one-shot generate by @sockeye44 in #1330
Improve speculative prefill when using MTP drafters by @lucasnewman in #1334
Respect generation_config sampling defaults by @Blaizzy in #1337
Add server thinking defaults by @Blaizzy in #1342
Add Gemma 4 DLM by @Blaizzy in #1347
Fix DiffusionGemma long-context prefill by @Blaizzy in #1348

Full Changelog: v0.6.2...v0.6.3