Changes
- Llama-v2: add instruction template, autodetect the truncation length, add conversion documentation
- [GGML] Support for customizable RoPE by @randoentity in #3083
- Optimize llamacpp_hf (a bit)
- Add Airoboros-v1.2 template
- Disable "Autoload the model" by default
- Disable auto-loading at startup when only one model is available by @jllllll in #3187
- Don't unset the LoRA menu when loading a model
- Bump accelerate to 0.21.0
- Bump bitsandbytes to 0.40.2 (Windows wheels provided by @jllllll in #3186)
- Bump AutoGPTQ to 0.3.0 (loading LoRAs is now supported out of the box)
- Update LLaMA-v1 documentation