github Mozilla-Ocho/llamafile 0.7.1
llamafile v0.7.1

latest releases: 0.8.16, 0.8.15, 0.8.14...
6 months ago

This release fixes bugs in the 0.7.0 release.

  • Fix 2 embeddings-related issues in server.cpp (#324)
  • Detect search query to start webchat (#333)
  • Use LLAMAFILE_GPU_ERROR value -2 instead of -1 (#291)
  • Fix --silent-prompt flag regression #328
  • Clamp out of range values in K quantizer ef0307e
  • Update to latest q5_k quantization code a8b0b15
  • Change file format magic number for recently bf16 file format introduced in 0.7.0. This is a breaking change. It's due to a numbering conflict with the upstream project. We're still waiting on a permanent assignment for bfloat16 so this could potentially change again. Follow ggerganov/llama.cpp#6412 for updates.

Mixtral 8x22b and Grok support are not available in this release, but they are available if you build llamafile from source on the main branch at HEAD. We're currently dealing with an AMD Windows GPU support regression there. Once it's resolved, a 0.8 release will ship.

Don't miss a new llamafile release

NewReleases is sending notifications on new releases.