- Add Julia syntax highlighting support
- Fix possible crash on Windows due to MT bug
- Improve accuracy of chatbot context window management
- The new
llamafiler
server now supports GPU. Pass the-ngl 999
flag. - The new
llamafiler
server's/v1/chat/completions
endpoint now supports prompt caching. It may be configured using the--slots COUNT
and--ctx-size TOKENS
flags.