llamafile lets you distribute and run LLMs with a single file
This release features significant improvements to GPU support.
- 4616816 Introduce support for multiple GPUs
- 6559da6 Introduce AMD GPU support for Linux
- 20d5f46 Make CLIP GPU acceleration work on UNIX / Windows
The llamafile server is now more reliable. Invalid JSON won't crash the
server. Opening a browser tab won't prevent the server from starting.
- 3384234 Upgrade to cosmocc 3.2.4
- 585c2d8 Make browser tab launching more reliable
- 7a5ec37 Show IP addresses when binding to 0.0.0.0
- d39ec38 Enable setting thread affinity on NUMA systems
You can now say llamafile -m foo.llamafile
to load a model from a
llamafile without having to execute it, or extract the gguf file.
- bb136e1 Support opening weights from llamafiles
The documentation has been improved (but still a work in progress).
- 7ad00db Add more content to manual
Example llamafiles
Our llamafiles on Hugging Face are updated shortly after a release goes live.
Flagship models
Supreme models (highest-end consumer hardware)
- https://hf.co/jartine/Mixtral-8x7B-Instruct-v0.1-llamafile
- https://hf.co/jartine/WizardCoder-Python-34B-V1.0-llamafile
Tiny models (small enough to use on raspberry pi)
- https://hf.co/jartine/phi-2-llamafile
- https://hf.co/jartine/rocket-3B-llamafile
- https://hf.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF
Other models:
- https://hf.co/jartine/jartine/wizardcoder-13b-python
- https://hf.co/jartine/jartine/Nous-Hermes-Llama2-llamafile
- https://hf.co/jartine/jartine/dolphin-2.5-mixtral-8x7b-llamafile
If you have a slow Internet connection and want to update your llamafiles without needing to redownload, then see the instructions here: #24 (comment) You can also download llamafile-0.6
and simply say ./llamafile-0.6 -m old.llamafile
to run your old weights.