llamafile lets you distribute and run LLMs with a single file
The llamafile-main and llamafile-llava-cli programs have been
unified into a single command named llamafile. Man pages now exist in
pdf, troff, and postscript format. There's much better support for shell
scripting, thanks to a new --silent-prompt flag. It's now possible to
shell script vision models like LLaVA using grammar constraints.
- d4e2388 Add --version flag
- baf216a Make ctrl-c work better
- 762ad79 Add
make installbuild rule - 7a3e557 Write man pages for all commands
- c895a44 Remove stdout logging in llava-cli
- 6cb036c Make LLaVA more shell script friendly
- 28d3160 Introduce --silent-prompt flag to main
- 1cd334f Allow --grammar to be used on --image prompts
The OpenAI API in llamafile-server has been improved.
- e8c92bc Make OpenAI API
stopfield optional (#36) - c1c8683 Avoid bind() conflicts on port 8080 w/ server
- 8cb9fd8 Recognize cache_prompt parameter in OpenAI API
Performance regressions have been fixed for Intel and AMD users.
- 73ee0b1 Add runtime dispatching for Q5 weights
- 36b103e Make Q2/Q3 weights go 2x faster on AMD64 AVX2 CPUs
- b4dea04 Slightly speed up LLaVA runtime dispatch on Intel
The zipalign command is now feature complete.
- 76d47c0 Put finishing touches on zipalign tool
- 7b2fbcb Add support for replacing zip files to zipalign
Some additional improvements:
- 5f69bb9 Add SVG logo
- cd0fae0 Make memory map loader go much faster on MacOS
- c8cd8e1 Fix output path in llamafile-quantize
- dd1e0cd Support attention_bias on LLaMA architecture
- 55467d9 Fix integer overflow during quantization
- ff1b437 Have makefile download cosmocc automatically
- a7cc180 Update grammar-parser.cpp (#48)
- 61944b5 Disable pledge on systems with GPUs
- ccc377e Log cuda build command to stderr
Our .llamafiles on Hugging Face have been updated to incorporate these new release binaries. You can redownload here:
- https://huggingface.co/jartine/llava-v1.5-7B-GGUF/tree/main
- https://huggingface.co/jartine/mistral-7b.llamafile/tree/main
- https://huggingface.co/jartine/wizardcoder-13b-python/tree/main
If you have a slower Internet connection and don't want to re-download, then you don't have to! Instructions are here:
![[line drawing of llama animal head in front of slightly open manilla folder filled with files]](https://private-user-images.githubusercontent.com/49262/289660212-bbcb0dde-4cd9-431a-9f79-ccb5ecd912d6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjQ1NTI3NzgsIm5iZiI6MTcyNDU1MjQ3OCwicGF0aCI6Ii80OTI2Mi8yODk2NjAyMTItYmJjYjBkZGUtNGNkOS00MzFhLTlmNzktY2NiNWVjZDkxMmQ2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MjUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODI1VDAyMjExOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY2YzVhYjBhOWVlZDBhZDg3NTEwYjg2YjRjMGJjMjk4YTA2Y2Q1YmM2NTJmNWVkNzNhYTEyM2U3ZDRmZjhmNjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.IIVW6xsCpej7Z3fDR66aaESzvREy0nV7g7rC5Q_mizE)