What's Changed (this repo branch)
- Sync to v0.15.0
- ROCM 7.2 on Dockerfile. This enables flash attention on some older APU models.
What's Changed (from Ollama)
New models
- GLM-4.7-Flash: As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
- LFM2.5-1.2B-Thinking: LFM2.5 is a new family of hybrid models designed for on-device deployment.
- TranslateGemma: A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.
What's Changed
- Improve /v1/responses API to better confirm to OpenResponses specification
- Fixed issue where Ollama's macOS app would interrupt system shutdown
- Fixed ollama create and ollama show commands for experimental models
- The /api/generate API can now be used for image generation
- Fixed minor issues in Nemotron-3-Nano tool parsing
- Fixed issue where removing an image generation model would cause it to first load
- Fixed issue where ollama rm would only stop the first model in the list if it were running
- New ollama launch command for Claude Code, Codex, OpenCode, and Droid
- Fixed issue where creating multi-line strings with """ would not work when using ollama run
- Ctrl+J and Shift+Enter now work for inserting newlines in ollama run
- Reduced memory usage for GLM-4.7-Flash models
New Contributors
- @yuhongsun96 made their first contribution in ollama#13135
- @koaning made their first contribution in ollama#13326
Full Changelog: v0.14.1...v0.15.0