What's Changed (this repo branch)
- Sync to v0.12.4
- AMD GTT patches discontinuation: Main Ollama now supports AMD APUs, RDNA1 class APUs to current, and VEGA class APUs are now discontinued.
- New main current branch of the repo. We now host container images with additional AMD ROCM optimizations over the current main Ollama
We offer a container image with the most recent Ollama that is compatible with the old GTT patches.
What's Changed (from Ollama)
What's Changed
- Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
- Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
- Fixed an issue where
keep_alivein the API would accept different values for the/api/chatand/api/generateendpoints - Fixed tool calling rendering with
qwen3-coder - More reliable and accurate VRAM detection
OLLAMA_FLASH_ATTENTIONcan now be overridden to0for models that have flash attention enabled by default- macOS 12 Monterey and macOS 13 Ventura are no longer supported
- Fixed crash where templates were not correctly defined
- Fix memory calculations on NVIDIA iGPUs
Full Changelog: https://github.com/rjmalagon/ollama-linux-amd-apu/commits/v0.12.4