What's Changed (this repo branch)

Sync to v0.12.4
AMD GTT patches discontinuation: Main Ollama now supports AMD APUs, RDNA1 class APUs to current, and VEGA class APUs are now discontinued.
New main current branch of the repo. We now host container images with additional AMD ROCM optimizations over the current main Ollama
We offer a container image with the most recent Ollama that is compatible with the old GTT patches.

What's Changed (from Ollama)

Flash attention is now enabled by default for Qwen 3 and Qwen 3 Coder
Fixed minor memory estimation issues when scheduling models on NVIDIA GPUs
Fixed an issue where keep_alive in the API would accept different values for the /api/chat and /api/generate endpoints
Fixed tool calling rendering with qwen3-coder
More reliable and accurate VRAM detection
OLLAMA_FLASH_ATTENTION can now be overridden to 0 for models that have flash attention enabled by default
macOS 12 Monterey and macOS 13 Ventura are no longer supported
Fixed crash where templates were not correctly defined
Fix memory calculations on NVIDIA iGPUs

Full Changelog: https://github.com/rjmalagon/ollama-linux-amd-apu/commits/v0.12.4