llamafile lets you distribute and run LLMs with a single file
This release improves the performance and accuracy of both CPU and GPU computations in addition to security.
- tinyBLAS now gives outputs consistent with the cuBLAS thanks to Kahan summation on matvec ops. This is good news for Windows users, because llamafile releases bundle tinyBLAS DLLs for driver-only GPU support. That support will now be faster, and more accurate than before, thereby reducing the need to install the CUDA / ROCm SDKs yourself.
- Prompt evaluation now goes much faster on CPU. For example, f16 weights on Raspberry Pi 5 are now 8x faster. These new optimizations mostly apply to
F16,BF16,Q8_0,Q4_0,Q4_0, andF32weights. Depending on the hardware and weights being used, we've observed llamafile-0.7 going anywhere between 30% to 500% faster than llama.cpp upstream. - Support for the bf16 data type has been introduced for CPU only, which is the Google Brain floating point format.
- Support for AVX512 has been introduced. Owners of CPUs like Zen4 can expect to see 10x faster prompt eval times.
- If you want to run
llamafile-0.7 [...] --recompile --gpu amdsupport on Windows, this release requires that you use version 5.7+ of the ROCm HIP SDK, which may be downloaded here. - This release includes a security fix for CVE-2024-23496 (see #294).
- This release is synced with llama.cpp 2024-03-22 upstream.
![[line drawing of llama animal head in front of slightly open manilla folder filled with files]](https://private-user-images.githubusercontent.com/49262/289660212-bbcb0dde-4cd9-431a-9f79-ccb5ecd912d6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjQ1MDE4ODcsIm5iZiI6MTcyNDUwMTU4NywicGF0aCI6Ii80OTI2Mi8yODk2NjAyMTItYmJjYjBkZGUtNGNkOS00MzFhLTlmNzktY2NiNWVjZDkxMmQ2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODI0VDEyMTMwN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYzYjY3Zjc2NjA3NjBjODI5MmU3NjMwMGQwNWJmNTRiNjE2Y2Y0ZjU4ZmVlOWIxNTNjZTk3YjNjNjQwOTYzZmQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.12XsIBnx-bsBDWG5v17YcWvU8NnQLlxaxBPUwAcMwzE)