Mozilla-Ocho/llamafile 0.6.2 on GitHub

llamafile lets you distribute and run LLMs with a single file

This release synchronizes with llama.cpp upstream and polishes GPU
auto-configuration. Support for splitting a model onto multiple NVIDIA
GPUs has been restored.

dfd3335 Synchronize with llama.cpp 2024-01-27
c008e43 Synchronize with llama.cpp 2024-01-26
e34b35c Make GPU auto configuration more resilient
79b88f8 Sanitize -ngl flag on Apple Metal

There's a known issue with support for splitting onto multiple AMD GPUs,
which currently doesn't work. This is an upstream issue we're working to
solve. The workaround is to set export HIP_VISIBLE_DEVICES=0 in your
environment when running llamafile, so it'll only see the first GPU.

Example llamafiles

Our llamafiles on Hugging Face are updated shortly after a release goes live.

Flagship models

Supreme models (highest-end consumer hardware)

Tiny models (small enough to use on raspberry pi)

Other models:

If you have a slow Internet connection and want to update your llamafiles without needing to redownload, then see the instructions here: #24 (comment) You can also download llamafile-0.6.2 and simply say ./llamafile-0.6.2 -m old.llamafile to run your old weights.

Mozilla-Ocho/llamafile 0.6.2 llamafile v0.6.2 on GitHub

Example llamafiles

Mozilla-Ocho/llamafile 0.6.2
llamafile v0.6.2

on GitHub