What's Changed (this repo branch)
- Sync to Ollama main v0.11.5
- Apply @progval linux kernel version detection patches
- "accommodate" the new Ollama memory allocation updates to the AMD APUs (may be somehow broken)
What's Changed (from Ollama)
What's changed
- Performance improvements for the
gpt-ossmodels - Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
- Fix error when parsing bad harmony tool calls
OLLAMA_FLASH_ATTENTION=1will also enable flash attention for pure-CPU models- Fixed OpenAI-compatible API not supporting
reasoning_effort - Reduced size of installation on Windows and Linux
- New Memory Management by @jessegross in ollama#11090
- openai: allow for content and tool calls in the same message by @drifkin in ollama#11759
- openai: when converting role=tool messages, propagate the tool name by @drifkin in ollama#11761
- openai: always provide reasoning by @drifkin in ollama#11765
New ContributorsNew Contributors
- @vorburger made their first contribution in ollama#11755
- @dan-and made their first contribution in ollama#10678
- @youzichuan made their first contribution in ollama#11880
- @gao-feng made their first contribution in ollama#11170
Full Changelog: v0.11.3...v0.11.5