RubyLLM 1.10: Extended Thinking, Persistent Thoughts & Streaming Fixes 🧠✨🚆

This release brings first-class extended thinking across providers, full Gemini 3 Pro/Flash thinking-signature support (chat + tools), a Rails upgrade path to persist it, and a tighter streaming pipeline. Plus official Ruby 4.0 support, safer model registry refreshes, a Vertex AI global endpoint fix, and a docs refresh.

🧠 Extended Thinking Everywhere

Tune reasoning depth and budget across providers with with_thinking, and get thinking output back when available:

chat = RubyLLM.chat(model: "claude-opus-4.5")
  .with_thinking(effort: :high, budget: 8000)

response = chat.ask("Prove it with numbers.")
response.thinking&.text
response.thinking&.signature
response.thinking_tokens

response.thinking and chunk.thinking expose thinking content during normal and streaming requests.
response.thinking_tokens and response.tokens.thinking track thinking token usage when providers report it.
Gemini 3 Pro/Flash fully support thought signatures across chat and tool calls, so multi-step sessions stay consistent.
Extended thinking quirks are now normalized across providers so you can tune one API and get predictable output.

Stream thinking and answer content side-by-side:

chat = RubyLLM.chat(model: "claude-opus-4.5")
  .with_thinking(effort: :medium)

chat.ask("Solve this step by step: What is 127 * 43?") do |chunk|
  print chunk.thinking&.text
  print chunk.content
end

Streaming stays backward-compatible: existing apps can keep printing chunk.content, while richer UIs can also render chunk.thinking.

🧰 Rails + ActiveRecord Persistence

Thinking output can now be stored alongside messages (text, signature, and token usage), with an upgrade generator for existing apps:

rails generate ruby_llm:upgrade_to_v1_10
rails db:migrate

Adds thinking_text, thinking_signature, and thinking_tokens to message tables.
Adds thought_signature to tool calls for Gemini tool calling.
Fixes a Rails streaming issue where the first tokens could be dropped.

📊 Unified Token Tracking

All token counts now live in response.tokens and message.tokens, including input, output, cached, cache creation, and thinking tokens.

✅ Official Ruby 4.0 Support

Ruby 4.0 is now officially supported in CI and dependencies.

🧩 Model Registry Updates

Refreshing the registry no longer deletes models from providers you haven't configured.

🌍 Vertex AI Global Endpoint Fix

When vertexai_location is global, the API base now correctly resolves to:

https://aiplatform.googleapis.com/v1beta1

📚 Docs Updates

New extended thinking guide.
Token usage docs include thinking tokens.

Installation

gem "ruby_llm", "1.10.0"

Upgrading from 1.9.x

bundle update ruby_llm
rails generate ruby_llm:upgrade_to_v1_10
rails db:migrate

Merged PRs

Fix Vertex AI Global Endpoint URL Construction by @NielsKSchjoedt in #553

New Contributors

@NielsKSchjoedt made their first contribution in #553

Full Changelog: 1.9.2...1.10.0

crmne/ruby_llm 1.10.0 on GitHub