Blaizzy/mlx-vlm v0.4.4 on GitHub

What's Changed

Fix Gemma 4 chunked prefill for KV-shared models and thinking by @Blaizzy in #901
Fix Gemma 4 vision + text degradation and missing processor config by @Blaizzy in #906
Fix Falcon-Perception 300M and move generate_perception to model by @Blaizzy in #910
Fix Gemma 4 tool parser for nested arguments by @Blaizzy in #916
Add VisionFeatureCache for multi-turn image caching by @Blaizzy in #913
Fix broken video_generate and smolvlm_video_generate CLI commands by @Blaizzy in #919
Optimize TurboQuant Metal kernels: 0.85-1.90x baseline with 89% KV savings by @Blaizzy in #909

Full Changelog: v0.4.3...v0.4.4