What's Changed
- Causal Masking and model selection from
.toml
files by @EricLBuehler in #278 - Remove sliding window mask from quantized phi3 by @EricLBuehler in #280
- Fix Causal Mask by @EricLBuehler in #282
- Fix mask caching by @EricLBuehler in #283
- More intelligent scheduler by @EricLBuehler in #279
- Use
warn!
macro by @EricLBuehler in #289 - Use a public repo for tests tokenizer.json by @EricLBuehler in #290
- Implement Speculative Decoding by @EricLBuehler in #242
- Add X-LoRA support for GGUF by @EricLBuehler in #293
- Add some "senseful" fallbacks for
isq
by @LLukas22 in #272 - Implement dynamic LoRA swapping by @EricLBuehler in #262
- More verbose logging when loading locally by @EricLBuehler in #298
- Make speculative decoding faster without anything fancy by @EricLBuehler in #297
- fix bug with mistralrs cuda by @joshpopelka20 in #299
New Contributors
- @joshpopelka20 made their first contribution in #299
New Features
- Speculative decoding introduced
- GGUF support for Phi 3
- Dynamic LoRA adapter activation support
Full Changelog: v0.1.5...v0.1.6