llamafile lets you distribute and run LLMs with a single file
The llamafile-main
and llamafile-llava-cli
programs have been
unified into a single command named llamafile
. Man pages now exist in
pdf, troff, and postscript format. There's much better support for shell
scripting, thanks to a new --silent-prompt
flag. It's now possible to
shell script vision models like LLaVA using grammar constraints.
- d4e2388 Add --version flag
- baf216a Make ctrl-c work better
- 762ad79 Add
make install
build rule - 7a3e557 Write man pages for all commands
- c895a44 Remove stdout logging in llava-cli
- 6cb036c Make LLaVA more shell script friendly
- 28d3160 Introduce --silent-prompt flag to main
- 1cd334f Allow --grammar to be used on --image prompts
The OpenAI API in llamafile-server
has been improved.
- e8c92bc Make OpenAI API
stop
field optional (#36) - c1c8683 Avoid bind() conflicts on port 8080 w/ server
- 8cb9fd8 Recognize cache_prompt parameter in OpenAI API
Performance regressions have been fixed for Intel and AMD users.
- 73ee0b1 Add runtime dispatching for Q5 weights
- 36b103e Make Q2/Q3 weights go 2x faster on AMD64 AVX2 CPUs
- b4dea04 Slightly speed up LLaVA runtime dispatch on Intel
The zipalign
command is now feature complete.
- 76d47c0 Put finishing touches on zipalign tool
- 7b2fbcb Add support for replacing zip files to zipalign
Some additional improvements:
- 5f69bb9 Add SVG logo
- cd0fae0 Make memory map loader go much faster on MacOS
- c8cd8e1 Fix output path in llamafile-quantize
- dd1e0cd Support attention_bias on LLaMA architecture
- 55467d9 Fix integer overflow during quantization
- ff1b437 Have makefile download cosmocc automatically
- a7cc180 Update grammar-parser.cpp (#48)
- 61944b5 Disable pledge on systems with GPUs
- ccc377e Log cuda build command to stderr
Our .llamafiles on Hugging Face have been updated to incorporate these new release binaries. You can redownload here:
- https://huggingface.co/jartine/llava-v1.5-7B-GGUF/tree/main
- https://huggingface.co/jartine/mistral-7b.llamafile/tree/main
- https://huggingface.co/jartine/wizardcoder-13b-python/tree/main
If you have a slower Internet connection and don't want to re-download, then you don't have to! Instructions are here: