Mozilla-Ocho/llamafile 0.7.1 on GitHub

This release fixes bugs in the 0.7.0 release.

Fix 2 embeddings-related issues in server.cpp (#324)
Detect search query to start webchat (#333)
Use LLAMAFILE_GPU_ERROR value -2 instead of -1 (#291)
Fix --silent-prompt flag regression #328
Clamp out of range values in K quantizer ef0307e
Update to latest q5_k quantization code a8b0b15
Change file format magic number for recently bf16 file format introduced in 0.7.0. This is a breaking change. It's due to a numbering conflict with the upstream project. We're still waiting on a permanent assignment for bfloat16 so this could potentially change again. Follow ggerganov/llama.cpp#6412 for updates.

Mixtral 8x22b and Grok support are not available in this release, but they are available if you build llamafile from source on the main branch at HEAD. We're currently dealing with an AMD Windows GPU support regression there. Once it's resolved, a 0.8 release will ship.

Mozilla-Ocho/llamafile 0.7.1 llamafile v0.7.1 on GitHub

Mozilla-Ocho/llamafile 0.7.1
llamafile v0.7.1

on GitHub