github Mozilla-Ocho/llamafile 0.3
llamafile v0.3

latest releases: 0.8.16, 0.8.15, 0.8.14...
11 months ago

llamafile lets you distribute and run LLMs with a single file

[line drawing of llama animal head in front of slightly open manilla folder filled with files]

The llamafile-main and llamafile-llava-cli programs have been
unified into a single command named llamafile. Man pages now exist in
pdf, troff, and postscript format. There's much better support for shell
scripting, thanks to a new --silent-prompt flag. It's now possible to
shell script vision models like LLaVA using grammar constraints.

  • d4e2388 Add --version flag
  • baf216a Make ctrl-c work better
  • 762ad79 Add make install build rule
  • 7a3e557 Write man pages for all commands
  • c895a44 Remove stdout logging in llava-cli
  • 6cb036c Make LLaVA more shell script friendly
  • 28d3160 Introduce --silent-prompt flag to main
  • 1cd334f Allow --grammar to be used on --image prompts

The OpenAI API in llamafile-server has been improved.

  • e8c92bc Make OpenAI API stop field optional (#36)
  • c1c8683 Avoid bind() conflicts on port 8080 w/ server
  • 8cb9fd8 Recognize cache_prompt parameter in OpenAI API

Performance regressions have been fixed for Intel and AMD users.

  • 73ee0b1 Add runtime dispatching for Q5 weights
  • 36b103e Make Q2/Q3 weights go 2x faster on AMD64 AVX2 CPUs
  • b4dea04 Slightly speed up LLaVA runtime dispatch on Intel

The zipalign command is now feature complete.

  • 76d47c0 Put finishing touches on zipalign tool
  • 7b2fbcb Add support for replacing zip files to zipalign

Some additional improvements:

  • 5f69bb9 Add SVG logo
  • cd0fae0 Make memory map loader go much faster on MacOS
  • c8cd8e1 Fix output path in llamafile-quantize
  • dd1e0cd Support attention_bias on LLaMA architecture
  • 55467d9 Fix integer overflow during quantization
  • ff1b437 Have makefile download cosmocc automatically
  • a7cc180 Update grammar-parser.cpp (#48)
  • 61944b5 Disable pledge on systems with GPUs
  • ccc377e Log cuda build command to stderr

Our .llamafiles on Hugging Face have been updated to incorporate these new release binaries. You can redownload here:

If you have a slower Internet connection and don't want to re-download, then you don't have to! Instructions are here:

Don't miss a new llamafile release

NewReleases is sending notifications on new releases.