Mozilla-Ocho/llamafile 0.6 on GitHub

llamafile lets you distribute and run LLMs with a single file

This release features significant improvements to GPU support.

4616816 Introduce support for multiple GPUs
6559da6 Introduce AMD GPU support for Linux
20d5f46 Make CLIP GPU acceleration work on UNIX / Windows

The llamafile server is now more reliable. Invalid JSON won't crash the
server. Opening a browser tab won't prevent the server from starting.

3384234 Upgrade to cosmocc 3.2.4
585c2d8 Make browser tab launching more reliable
7a5ec37 Show IP addresses when binding to 0.0.0.0
d39ec38 Enable setting thread affinity on NUMA systems

You can now say llamafile -m foo.llamafile to load a model from a
llamafile without having to execute it, or extract the gguf file.

bb136e1 Support opening weights from llamafiles

The documentation has been improved (but still a work in progress).

7ad00db Add more content to manual

Example llamafiles

Our llamafiles on Hugging Face are updated shortly after a release goes live.

Flagship models

Supreme models (highest-end consumer hardware)

Tiny models (small enough to use on raspberry pi)

Other models:

If you have a slow Internet connection and want to update your llamafiles without needing to redownload, then see the instructions here: #24 (comment) You can also download llamafile-0.6 and simply say ./llamafile-0.6 -m old.llamafile to run your old weights.

Mozilla-Ocho/llamafile 0.6 llamafile v0.6 on GitHub

Example llamafiles

Mozilla-Ocho/llamafile 0.6
llamafile v0.6

on GitHub