github Mozilla-Ocho/llamafile 0.6.1
llamafile v0.6.1

latest releases: 0.8.13, 0.8.12, 0.8.11...
8 months ago

llamafile lets you distribute and run LLMs with a single file

[line drawing of llama animal head in front of slightly open manilla folder filled with files]

This release fixes a crash that can happen on Apple Metal GPUs.

  • 9c85d9c Fix free() related crash in ggml-metal.m

Windows users will see better performance with tinyBLAS. Please note we
still recommend installing the CUDA SDK (NVIDIA), or HIP/ROCm SDK (AMD)
for maximum performance and accuracy if you're in their support vector.

  • df0b3ff Use thread-local register file for matmul speedups (#205)
  • 4892494 Change BM/BN/BK to template parameters (#203)
  • ed05ba9 Reduce server memory use on Windows

This release also synchronizes with llama.cpp upstream (as of Jan 9th)
along with other improvements.

  • 133b05e Sync with llama.cpp upstream
  • 67d97b5 Use hipcc on $PATH if it exists
  • 15e2339 Do better job reporting AMD hipBLAS errors
  • c617679 Don't crash when --image argument is invalid
  • 3e8aa78 Clarify install/gpu docs/behavior per feedback
  • eb4989a Fix typo in OpenAI API

Example llamafiles

Our llamafiles on Hugging Face are updated shortly after a release goes live.

Flagship models

Supreme models (highest-end consumer hardware)

Tiny models (small enough to use on raspberry pi)

Other models:

If you have a slow Internet connection and want to update your llamafiles without needing to redownload, then see the instructions here: #24 (comment) You can also download llamafile-0.6.1 and simply say ./llamafile-0.6.1 -m old.llamafile to run your old weights.

Don't miss a new llamafile release

NewReleases is sending notifications on new releases.