github Mozilla-Ocho/llamafile 0.6
llamafile v0.6

latest releases: 0.8.16, 0.8.15, 0.8.14...
10 months ago

llamafile lets you distribute and run LLMs with a single file

[line drawing of llama animal head in front of slightly open manilla folder filled with files]

This release features significant improvements to GPU support.

  • 4616816 Introduce support for multiple GPUs
  • 6559da6 Introduce AMD GPU support for Linux
  • 20d5f46 Make CLIP GPU acceleration work on UNIX / Windows

The llamafile server is now more reliable. Invalid JSON won't crash the
server. Opening a browser tab won't prevent the server from starting.

  • 3384234 Upgrade to cosmocc 3.2.4
  • 585c2d8 Make browser tab launching more reliable
  • 7a5ec37 Show IP addresses when binding to 0.0.0.0
  • d39ec38 Enable setting thread affinity on NUMA systems

You can now say llamafile -m foo.llamafile to load a model from a
llamafile without having to execute it, or extract the gguf file.

  • bb136e1 Support opening weights from llamafiles

The documentation has been improved (but still a work in progress).

  • 7ad00db Add more content to manual

Example llamafiles

Our llamafiles on Hugging Face are updated shortly after a release goes live.

Flagship models

Supreme models (highest-end consumer hardware)

Tiny models (small enough to use on raspberry pi)

Other models:

If you have a slow Internet connection and want to update your llamafiles without needing to redownload, then see the instructions here: #24 (comment) You can also download llamafile-0.6 and simply say ./llamafile-0.6 -m old.llamafile to run your old weights.

Don't miss a new llamafile release

NewReleases is sending notifications on new releases.