Highlights
- New model topology feature: ISQ and device mapping
- 🔥Faster FlashAttention support when batching
- Removed
plotly
and associated JS dependencies - φ³ Support Phi 3.5, Phi 3.5 vision, Phi 3.5 MoE
- Improved Rust API ergonomics
- Support multiple (shaded) GGUF files
MSRV
The Rust MSRV of this version is 1.79.0
What's Changed
- Fixes for auto dtype selection with RUST_BACKTRACE=1 by @EricLBuehler in #690
- Add support multiple GGUF files by @EricLBuehler in #692
- Refactor normal and vision loaders by @EricLBuehler in #693
- Fix
split.count
GGUF duplication handling by @EricLBuehler in #695 - Batching example by @EricLBuehler in #694
- Some fixes by @EricLBuehler in #697
- Improve vision rust examples by @EricLBuehler in #698
- Add ISQ topology by @EricLBuehler in #701
- Add custom logits processor API by @EricLBuehler in #702
- Add Gemma 2 PagedAttention support by @EricLBuehler in #704
- Faster RmsNorm in Gemma/Gemma2 by @EricLBuehler in #703
- Fix bug in Metal ISQ by @EricLBuehler in #706
- Support GGUF BF16 tensors by @EricLBuehler in #691
- Better support for FlashAttention: real batching + sliding window + softcap by @EricLBuehler in #707
- Remove some usages of
pub
in models by @EricLBuehler in #708 - Support the Phi 3.5 V model by @EricLBuehler in #710
- Implement the Phi 3.5 MoE model by @EricLBuehler in #709
- Device map topology by @EricLBuehler in #717
- Implement DRY penalty by @EricLBuehler in #637
- Remove plotly and just output CSV loss file by @EricLBuehler in #700
- Using once_cell to reduce MSRV by @EricLBuehler in #724
- Fixes for Windows build by @EricLBuehler in #729
- Even more phi3.5moe fix attempts by @EricLBuehler in #731
- Add example for Phi 3.5 MoE by @EricLBuehler in #733
- Add Phi 3.5 chat template by @EricLBuehler in #734
- Patch ISQ for Mixtral by @EricLBuehler in #730
- Gracefully handle Engine Drop with termination request by @EricLBuehler in #735
- feat(vision): add support for proper file and data image URLs by @Schuwi in #727
- Add new parsing to Python API by @EricLBuehler in #737
- Remove test and add custom error type to Python API by @EricLBuehler in #738
- Update kernels for metal bf16 by @EricLBuehler in #719
- Better
Response
Result API by @EricLBuehler in #739 - More Metal quantized kernel fixes by @EricLBuehler in #740
- [Breaking] Bump version to v0.3.0 by @EricLBuehler in #736
- Final changes for v0.3.0 by @EricLBuehler in #741
New Contributors
Full Changelog: v0.2.5...v0.3.0