What's Changed
- Refactor ISQ quant parsing by @EricLBuehler in #664
- Refactor server examples to use OpenAI Python client by @EricLBuehler in #665
- Implement prompt chunking by @EricLBuehler in #623
- Python example and server example cleanup by @EricLBuehler in #668
- Implement GPTQ quantization by @EricLBuehler in #467
- Update deps by @EricLBuehler in #672
- Rework the automatic dtype selection feature by @EricLBuehler in #676
- Fix backend Candle fork Metal, flash attn, also Llama linear by @EricLBuehler in #681
- Use converted tokenizer.json in tests by @EricLBuehler in #682
- Refactor ISQ and mistralrs-quant by @EricLBuehler in #683
- Fix metal build for isq by @EricLBuehler in #686
- Add missing error case in automatic dtype selection feature by @ac3xx in #685
- fix null in tool type response by @wseaton in #687
- Implement HQQ quantization by @EricLBuehler in #677
- Bump version to 0.2.5 by @EricLBuehler in #688
New Contributors
Full Changelog: v0.2.4...v0.2.5
Install mistralrs-server 0.2.5
Install prebuilt binaries via shell script
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.5/mistralrs-server-installer.sh | sh
Download mistralrs-server 0.2.5
File | Platform | Checksum |
---|---|---|
mistralrs-server-aarch64-apple-darwin.tar.xz | Apple Silicon macOS | checksum |
mistralrs-server-x86_64-apple-darwin.tar.xz | Intel macOS | checksum |
mistralrs-server-x86_64-unknown-linux-gnu.tar.xz | x64 Linux | checksum |