github Mozilla-Ocho/llamafile 0.6.2
llamafile v0.6.2

latest releases: 0.8.16, 0.8.15, 0.8.14...
9 months ago

llamafile lets you distribute and run LLMs with a single file

[line drawing of llama animal head in front of slightly open manilla folder filled with files]

This release synchronizes with llama.cpp upstream and polishes GPU
auto-configuration. Support for splitting a model onto multiple NVIDIA
GPUs has been restored.

  • dfd3335 Synchronize with llama.cpp 2024-01-27
  • c008e43 Synchronize with llama.cpp 2024-01-26
  • e34b35c Make GPU auto configuration more resilient
  • 79b88f8 Sanitize -ngl flag on Apple Metal

There's a known issue with support for splitting onto multiple AMD GPUs,
which currently doesn't work. This is an upstream issue we're working to
solve. The workaround is to set export HIP_VISIBLE_DEVICES=0 in your
environment when running llamafile, so it'll only see the first GPU.

Example llamafiles

Our llamafiles on Hugging Face are updated shortly after a release goes live.

Flagship models

Supreme models (highest-end consumer hardware)

Tiny models (small enough to use on raspberry pi)

Other models:

If you have a slow Internet connection and want to update your llamafiles without needing to redownload, then see the instructions here: #24 (comment) You can also download llamafile-0.6.2 and simply say ./llamafile-0.6.2 -m old.llamafile to run your old weights.

Don't miss a new llamafile release

NewReleases is sending notifications on new releases.