New features
- Support .bin, .pt, .pth extensions
- Add Starcoder 2 GGUF
- 🔥 PagedAttention - beating llama.cpp running GGUF plus all the throughput benefits 😉
- Optimized performance and memory usage
Rust MSRV
MSRV of mistral.rs
v0.2.0 is 1.75.
What's Changed
- Fix SWA order (flip it) for Gemma 2 by @EricLBuehler in #554
- Support .bin, .pt, .pth extensions by @EricLBuehler in #557
- Update readme by @EricLBuehler in #558
- Fix Starcoder 2 ISQ by @EricLBuehler in #559
- Update deps by @EricLBuehler in #560
- Add the starcoder2 GGUF arch by @EricLBuehler in #561
- Readme update for starcoder2 gguf by @EricLBuehler in #562
- Fix PyPI release trigger by @EricLBuehler in #566
- Optimize multi-batch and inference performance with PagedAttention by @EricLBuehler in #552
- [Breaking] Version 0.2.0 by @EricLBuehler in #527
- Paged attention support for vision models by @EricLBuehler in #567
- Automatically use paged attn on cuda, get memory size by @EricLBuehler in #568
- Add docs link for vision loader by @EricLBuehler in #570
- Add matching for valid model weight names by @EricLBuehler in #571
- Remove ensure about no paged attn for vision models by @EricLBuehler in #573
- Add percentage utilization support to paged attn by @EricLBuehler in #574
- Include block engine in paged attn metadata by @EricLBuehler in #576
- Update deps and sync Candle by @EricLBuehler in #578
- Optimize CLIP model by @EricLBuehler in #579
- Use softmax_last_dim in CLIP by @EricLBuehler in #580
- Fix method of calculating paged attn with util percent by @EricLBuehler in #581
- Handle windows in paged attn build by @EricLBuehler in #577
- Warn instead of error when paged attn not supported by @EricLBuehler in #583
- Warn instead of error when paged attn for adapters not supported by @EricLBuehler in #584
- Add support for lm_head to adapter models by @EricLBuehler in #586
- Add default plotly feature by @EricLBuehler in #587
- Improve memory handling of PagedAttention with GGUF by @EricLBuehler in #590
- Fix Windows build on cuda w/ PagedAttention by @EricLBuehler in #589
- Update cuda kernels build.rs on windows by @EricLBuehler in #591
- Bump version to 0.2.0 and update docs by @EricLBuehler in #582
Full Changelog: v0.1.26...v0.2.0
Install mistralrs-server 0.2.0
Install prebuilt binaries via shell script
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.0/mistralrs-server-installer.sh | sh
Download mistralrs-server 0.2.0
File | Platform | Checksum |
---|---|---|
mistralrs-server-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
mistralrs-server-x86_64-apple-darwin.tar.xz | Intel macOS | checksum |
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |