What's Changed (this repo branch)

What's Changed (from Ollama)

Ministral-3: The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
Mistral-Large-3: A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.
Qwen3-Next: The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.
Devstral-Small-2: 24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
rnj-1: Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models.
nomic-embed-text-v2: nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.

nomic-embed-text will now use Ollama's engine by default
Tool calling support for cogito-v2.1
Fixed issues with CUDA VRAM discovery
Fixed link to docs in Ollama's app
Fixed issue where models would be evicted on CPU-only systems
Ollama will now better render errors instead of showing Unmarshal: errors
Fixed issue where CUDA GPUs would fail to be detected with older GPUs
Added thinking and tool parsing for
Flash attention is now enabled by default for vision models such as mistral-3, gemma3, qwen3-vl and more. This improves memory utilization and performance when providing images as input.
Fixed GPU detection on multi-GPU CUDA machines
Fixed issue where deepseek-v3.1 would always think even with thinking is disabled in Ollama's app
Improved truncation logic when using /api/embed and /v1/embeddings
Extend Gemma 3 architecture to support rnj-1 model
Fix error that would occur when running qwen2.5vl with image input

Full Changelog: v0.13.0...v0.13.3