ml-explore/mlx-lm v0.30.3 on GitHub

What's Changed

Add AWQ/GPTQ weight transformation utilities by @ericcurtin in #730
Add IQuest Coder V1 Loop variant by @kernelpool in #716
Fix sliding window batching by @awni in #738
Fix Batch Generation: Add extract method to ArraysCache for item retrieval by @Goekdeniz-Guelmez in #740
Make MambaCache compatible with batch generation for nemotron-h by @nikhilmitrax in #690
Add a server benchmark for continuous batching by @awni in #728
Fix tools parameter in apply_chat_template call by @kernelpool in #747
Refactor tokenizer error handling to use warnings instead of exceptio… by @cubist38 in #744
Make cache list batchable by @awni in #743
Fix batch generation for IQuestLoopCoder model by @kernelpool in #748
Fix type hint and pydoc for batch_generate by @tibbes in #745
Handle empty caches during batch merge by @ivanfioravanti in #755
Update for latest mlx by @awni in #759
Use compiled Swiglu by @awni in #753
Adds support for Nemotron Super 49b v1.5 by @lazarust in #756
fix(falcon_h1): support tied embeddings and correct muP scaling by @solarpunkin in #764
Fix swiglu parameter order by @kernelpool in #767
Fix CacheList batching by @kernelpool in #769
fix: unused batch_size parameter for mlx_lm.evaluate by @AndrewTan517 in #762
Add gpt-oss sharding by @Evanev7 in #761
Fix LongCat Flash extended context support by @kernelpool in #768
Add minimax tensor sharding by @Evanev7 in #760
Shard LongCat Flash by @kernelpool in #771
Add glm4 moe lite model by @ivanfioravanti in #776

Full Changelog: v0.30.2...v0.30.3