EricLBuehler/mistral.rs v0.2.0 on GitHub

New features

Support .bin, .pt, .pth extensions
Add Starcoder 2 GGUF
🔥 PagedAttention - beating llama.cpp running GGUF plus all the throughput benefits 😉
Optimized performance and memory usage

Rust MSRV

MSRV of mistral.rs v0.2.0 is 1.75.

What's Changed

Fix SWA order (flip it) for Gemma 2 by @EricLBuehler in #554
Support .bin, .pt, .pth extensions by @EricLBuehler in #557
Update readme by @EricLBuehler in #558
Fix Starcoder 2 ISQ by @EricLBuehler in #559
Update deps by @EricLBuehler in #560
Add the starcoder2 GGUF arch by @EricLBuehler in #561
Readme update for starcoder2 gguf by @EricLBuehler in #562
Fix PyPI release trigger by @EricLBuehler in #566
Optimize multi-batch and inference performance with PagedAttention by @EricLBuehler in #552
[Breaking] Version 0.2.0 by @EricLBuehler in #527
Paged attention support for vision models by @EricLBuehler in #567
Automatically use paged attn on cuda, get memory size by @EricLBuehler in #568
Add docs link for vision loader by @EricLBuehler in #570
Add matching for valid model weight names by @EricLBuehler in #571
Remove ensure about no paged attn for vision models by @EricLBuehler in #573
Add percentage utilization support to paged attn by @EricLBuehler in #574
Include block engine in paged attn metadata by @EricLBuehler in #576
Update deps and sync Candle by @EricLBuehler in #578
Optimize CLIP model by @EricLBuehler in #579
Use softmax_last_dim in CLIP by @EricLBuehler in #580
Fix method of calculating paged attn with util percent by @EricLBuehler in #581
Handle windows in paged attn build by @EricLBuehler in #577
Warn instead of error when paged attn not supported by @EricLBuehler in #583
Warn instead of error when paged attn for adapters not supported by @EricLBuehler in #584
Add support for lm_head to adapter models by @EricLBuehler in #586
Add default plotly feature by @EricLBuehler in #587
Improve memory handling of PagedAttention with GGUF by @EricLBuehler in #590
Fix Windows build on cuda w/ PagedAttention by @EricLBuehler in #589
Update cuda kernels build.rs on windows by @EricLBuehler in #591
Bump version to 0.2.0 and update docs by @EricLBuehler in #582

Full Changelog: v0.1.26...v0.2.0

Install mistralrs-server 0.2.0

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.0/mistralrs-server-installer.sh | sh

Download mistralrs-server 0.2.0

File	Platform	Checksum
mistralrs-server-aarch64-apple-darwin.tar.xz	Apple Silicon macOS	checksum
mistralrs-server-x86_64-apple-darwin.tar.xz	Intel macOS	checksum
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz	x64 Linux	checksum