github Blaizzy/mlx-vlm v0.4.4

one day ago

What's Changed

  • Fix Gemma 4 chunked prefill for KV-shared models and thinking by @Blaizzy in #901
  • Fix Gemma 4 vision + text degradation and missing processor config by @Blaizzy in #906
  • Fix Falcon-Perception 300M and move generate_perception to model by @Blaizzy in #910
  • Fix Gemma 4 tool parser for nested arguments by @Blaizzy in #916
  • Add VisionFeatureCache for multi-turn image caching by @Blaizzy in #913
  • Fix broken video_generate and smolvlm_video_generate CLI commands by @Blaizzy in #919
  • Optimize TurboQuant Metal kernels: 0.85-1.90x baseline with 89% KV savings by @Blaizzy in #909

Full Changelog: v0.4.3...v0.4.4

Don't miss a new mlx-vlm release

NewReleases is sending notifications on new releases.