What's Changed
- Add 12 new model architectures for CPU and Metal inference (#1914)
These are Baichuan, BLOOM, CodeShell, GPT-2, Orion, Persimmon, Phi and Phi-2, Plamo, Qwen, Qwen2, Refact, and StableLM.
We don't have official downloads for these yet, but TheBloke offers plenty of compatible GGUF quantizations. - Restore minimum window size of 720x480 (1b524c4)
- Use ChatML for Mistral OpenOrca to make its output formatting more reliable (#1935)
Bug Fixes
- Fix VRAM not being freed when CPU fallback occurs - this makes switching models more reliable (#1901)
- Disable offloading of Mixtral to GPU because we crash otherwise (#1931)
- Limit extensions scanned by LocalDocs to txt, pdf, md, rst - other formats were inserting useless binary data (#1912)
- Fix missing scrollbar for chat history (490404d)
- Accessibility improvements (4258bb1)
New Contributors
Full Changelog: v2.6.2...v2.7.0