mozilla-ai/llamafile 0.4.1 on GitHub

llamafile lets you distribute and run LLMs with a single file

If you had trouble generating filenames following the "bash one-liners"
blog post using the latest release, then please try again.

0984ed8 Fix regression with --grammar flag

Crashes on older Intel / AMD systems should be fixed:

3490afa Fix SIGILL on older Intel/AMD CPUs w/o F16C

The OpenAI API compatible endpoint has been improved.

9e4bf29 Fix OpenAI server sampling w.r.t. temp and seed

This release improves the documentation.

5c7ff6e Improve llamafile manual
658b18a Add WSL CUDA to GPU section (#105)
586b408 Update README.md so links and curl commands work (#136)
a56ffd4 Update README to clarify Darwin kernel versioning
47d8a8f Fix README changing SSE3 to SSSE3
4da8e2e Fix README examples for certain UNIX shells
faa7430 Change README to list Mixtral Q5 (instead of Q3)
6b0b64f Fix CLI README examples

We're making strides to automating our testing process.

dadd5a7 Add CI (#126)

Some other improvements:

9e972b2 Improve README examples
9de5686 Support bos token in llava-cli
3d81e22 Set logger callback for Apple Metal
9579b73 Make it easier to override CPPFLAGS

Our .llamafiles on Hugging Face have been updated to incorporate these
new release binaries. You can redownload here:

Known Issues

LLaVA image processing using the builtin tinyBLAS library may go slow on Windows.
Here's the workaround for using the faster NVIDIA cuBLAS library instead.

Delete the .llamafile directory in your home directory.
Install CUDA
Install MSVC
Open the "x64 MSVC command prompt" from Start
Run llamafile there for the first invocation.

There's a YouTube video tutorial on doing this here: https://youtu.be/d1Fnfvat6nM?si=W6Y0miZ9zVBHySFj

mozilla-ai/llamafile 0.4.1 llamafile v0.4.1 on GitHub

Known Issues

mozilla-ai/llamafile 0.4.1
llamafile v0.4.1

on GitHub