✨ Changes
- Add ExLlamaV3 support (#6832). This is done through a new
ExLlamav3_HFloader that uses the same samplers asTransformersandExLlamav2_HF. Wheels compiled with GitHub Actions are included for both Linux and Windows, eliminating manual installation steps. Note: these wheels require compute capacity of 8 or greater, at least for now.- ExLlamaV3 repository: https://github.com/turboderp-org/exllamav3
- Models: https://huggingface.co/turboderp
- Add a new chat style: Dark (#6817).
- Set context lengths to at most 8192 by default to prevent OOM errors, and show the model's maximum length in the UI (#6835).
🔧 Bug fixes
- Fix a matplotlib bug in the Google Colab notebook.
- Fix links in the ngrok extension README (#6826). Thanks @KPCOFGS.
🔄 Backend updates
- Transformers: Bump to 4.50.
- CUDA: Bump to 12.4.
- PyTorch: Bump to 2.6.0.
- FlashAttention: Bump to v2.7.4.post1.
- PEFT: Bump to 0.15. This should make axolotl loras compatible with the project.