RubyLLM 1.10: Extended Thinking, Persistent Thoughts & Streaming Fixes 🧠✨🚆
This release brings first-class extended thinking across providers, full Gemini 3 Pro/Flash thinking-signature support (chat + tools), a Rails upgrade path to persist it, and a tighter streaming pipeline. Plus official Ruby 4.0 support, safer model registry refreshes, a Vertex AI global endpoint fix, and a docs refresh.
🧠 Extended Thinking Everywhere
Tune reasoning depth and budget across providers with with_thinking, and get thinking output back when available:
chat = RubyLLM.chat(model: "claude-opus-4.5")
.with_thinking(effort: :high, budget: 8000)
response = chat.ask("Prove it with numbers.")
response.thinking&.text
response.thinking&.signature
response.thinking_tokensresponse.thinkingandchunk.thinkingexpose thinking content during normal and streaming requests.response.thinking_tokensandresponse.tokens.thinkingtrack thinking token usage when providers report it.- Gemini 3 Pro/Flash fully support thought signatures across chat and tool calls, so multi-step sessions stay consistent.
- Extended thinking quirks are now normalized across providers so you can tune one API and get predictable output.
Stream thinking and answer content side-by-side:
chat = RubyLLM.chat(model: "claude-opus-4.5")
.with_thinking(effort: :medium)
chat.ask("Solve this step by step: What is 127 * 43?") do |chunk|
print chunk.thinking&.text
print chunk.content
end- Streaming stays backward-compatible: existing apps can keep printing
chunk.content, while richer UIs can also renderchunk.thinking.
🧰 Rails + ActiveRecord Persistence
Thinking output can now be stored alongside messages (text, signature, and token usage), with an upgrade generator for existing apps:
rails generate ruby_llm:upgrade_to_v1_10
rails db:migrate- Adds
thinking_text,thinking_signature, andthinking_tokensto message tables. - Adds
thought_signatureto tool calls for Gemini tool calling. - Fixes a Rails streaming issue where the first tokens could be dropped.
📊 Unified Token Tracking
All token counts now live in response.tokens and message.tokens, including input, output, cached, cache creation, and thinking tokens.
✅ Official Ruby 4.0 Support
Ruby 4.0 is now officially supported in CI and dependencies.
🧩 Model Registry Updates
- Refreshing the registry no longer deletes models from providers you haven't configured.
🌍 Vertex AI Global Endpoint Fix
When vertexai_location is global, the API base now correctly resolves to:
https://aiplatform.googleapis.com/v1beta1
📚 Docs Updates
- New extended thinking guide.
- Token usage docs include thinking tokens.
Installation
gem "ruby_llm", "1.10.0"Upgrading from 1.9.x
bundle update ruby_llm
rails generate ruby_llm:upgrade_to_v1_10
rails db:migrateMerged PRs
- Fix Vertex AI Global Endpoint URL Construction by @NielsKSchjoedt in #553
New Contributors
- @NielsKSchjoedt made their first contribution in #553
Full Changelog: 1.9.2...1.10.0