What's Changed
- Fixed error importing model vocabulary files
- Experimental: new flag to set KV cache quantization to 4-bit (
q4_0
), 8-bit (q8_0
) or 16-bit (f16
). This reduces VRAM requirements for longer context windows.
New Contributors
- @dmayboroda made their first contribution in #7906
- @Geometrein made their first contribution in #7908
- @owboson made their first contribution in #7693
Full Changelog: v0.4.7...v0.5.0-rc1