What's Changed
- Add AWQ/GPTQ weight transformation utilities by @ericcurtin in #730
- Add IQuest Coder V1 Loop variant by @kernelpool in #716
- Fix sliding window batching by @awni in #738
- Fix Batch Generation: Add extract method to ArraysCache for item retrieval by @Goekdeniz-Guelmez in #740
- Make MambaCache compatible with batch generation for nemotron-h by @nikhilmitrax in #690
- Add a server benchmark for continuous batching by @awni in #728
- Fix tools parameter in apply_chat_template call by @kernelpool in #747
- Refactor tokenizer error handling to use warnings instead of exceptio… by @cubist38 in #744
- Make cache list batchable by @awni in #743
- Fix batch generation for IQuestLoopCoder model by @kernelpool in #748
- Fix type hint and pydoc for batch_generate by @tibbes in #745
- Handle empty caches during batch merge by @ivanfioravanti in #755
- Update for latest mlx by @awni in #759
- Use compiled Swiglu by @awni in #753
- Adds support for Nemotron Super 49b v1.5 by @lazarust in #756
- fix(falcon_h1): support tied embeddings and correct muP scaling by @solarpunkin in #764
- Fix swiglu parameter order by @kernelpool in #767
- Fix CacheList batching by @kernelpool in #769
- fix: unused batch_size parameter for mlx_lm.evaluate by @AndrewTan517 in #762
- Add gpt-oss sharding by @Evanev7 in #761
- Fix LongCat Flash extended context support by @kernelpool in #768
- Add minimax tensor sharding by @Evanev7 in #760
- Shard LongCat Flash by @kernelpool in #771
- Add glm4 moe lite model by @ivanfioravanti in #776
New Contributors
- @ericcurtin made their first contribution in #730
- @nikhilmitrax made their first contribution in #690
- @tibbes made their first contribution in #745
- @solarpunkin made their first contribution in #764
- @AndrewTan517 made their first contribution in #762
- @Evanev7 made their first contribution in #761
Full Changelog: v0.30.2...v0.30.3