Blaizzy/mlx-vlm v0.4.3 on GitHub

What's Changed

Add SAM 3.1 with Object Multiplex and optimized realtime pipeline by @Blaizzy in #880
Add Falcon-OCR model support by @Griffintaur in #879
Add RF-DETR detection and segmentation model by @Blaizzy in #884
Add granite vision 3.2 and 4.0 by @Blaizzy in #885
Add wired_limit to vision model inference pipelines for CUDA support by @Blaizzy in #887
Add falcon perception by @Blaizzy in #888
fix(server): disable uvicorn reload by default to prevent memory leaks by @futurepitcher in #883
Add Gemma 4 model support (vision, audio, MoE) by @Blaizzy in #890
Fix Gemma 4 embedding scaling by @Blaizzy in #893
Add Turbo Quant by @Blaizzy in #858
Remove TurboQuant benchmark artifacts and add README docs by @Blaizzy in #894

Full Changelog: v0.4.2...v0.4.3