What's Changed
- Fix Gemma 4 chunked prefill for KV-shared models and thinking by @Blaizzy in #901
- Fix Gemma 4 vision + text degradation and missing processor config by @Blaizzy in #906
- Fix Falcon-Perception 300M and move generate_perception to model by @Blaizzy in #910
- Fix Gemma 4 tool parser for nested arguments by @Blaizzy in #916
- Add VisionFeatureCache for multi-turn image caching by @Blaizzy in #913
- Fix broken video_generate and smolvlm_video_generate CLI commands by @Blaizzy in #919
- Optimize TurboQuant Metal kernels: 0.85-1.90x baseline with 89% KV savings by @Blaizzy in #909
Full Changelog: v0.4.3...v0.4.4