github Mozilla-Ocho/llamafile 0.8.1
llamafile v0.8.1

latest releases: 0.8.16, 0.8.15, 0.8.14...
6 months ago
  • Support for Phi-3 Mini 4k has been introduced
  • A bug causing GPU module crashes on some systems has been resolved
  • Support for Command-R Plus has now been vetted with proper 64-bit indexing
  • We now support more AMD GPU architectures thanks to better detection of offload archs (#368)
  • We now ship prebuilt NVIDIA and ROCm modules for both Windows and Linux users. They link tinyBLAS which is a libre math library that only depends on the graphics driver being installed. Since it's slower, llamafile will automatically build a native module for your system if the CUDA or ROCm SDKs are installed. You can control this behavior using --nocompile or --recompile. Yes, Our LLavA llamafile still manages to squeak under the Windows 4GB file size limit!
  • An assertion error has been fixed that happened when using llamafile-quantize to create K quants from an F32 GGUF file
  • A new llamafile-tokenize command line tool has been introduced. For example, if you want to count how many "tokens" are in a text file, you can say cat file.txt | llamafile-tokenize -m model.llamafile | wc -l since it prints each token on a single line.

Don't miss a new llamafile release

NewReleases is sending notifications on new releases.