RubyLLM 1.6.2: Thinking Tokens & Performance 🧠

Quick maintenance release fixing Gemini's thinking token counting and bringing performance improvements. Plus we're removing capability gatekeeping - trust providers to know what they can do!

🧮 Fixed: Gemini Thinking Token Counting

Gemini 2.5 with thinking mode wasn't counting tokens correctly, leading to incorrect billing calculations:

# Before: Only counted candidatesTokenCount (109 tokens)
# Actual API response had:
#   candidatesTokenCount: 109
#   thoughtsTokenCount: 443
#   => Should be 552 total!

# Now: Correctly sums both token types
chat = RubyLLM.chat(model: 'gemini-2.5-flash')
response = chat.ask('What is 2+2? Think step by step.')
response.output_tokens  # => 552 (correctly summed)

This aligns with how all providers bill thinking/reasoning tokens - they're all output tokens. Fixes #346.

🚫 Capability Gatekeeping Removed

We were pre-checking if models support certain features before attempting to use them. But sometimes pre-emptive checks were getting in the way:

# Before 1.6.2: Pre-checked capabilities before attempting
chat.with_tool(MyTool)  # => UnsupportedFunctionsError (without trying)

# Now: Let the provider handle it
chat.with_tool(MyTool)  # Works if supported, provider errors if not

Why this approach is better:

Direct feedback - Get the actual provider error, not our pre-emptive block
Immediate support - New models and features work as soon as providers ship them
Custom models - Fine-tuned and custom models aren't artificially limited
Simpler flow - One less layer of validation between you and the provider

The provider knows what it can do. If it works, great! If not, you'll get a clear error from the source.

Same philosophy applies to structured output (with_schema).

⚡ Performance Improvements

Thanks to @tagliala for introducing RuboCop Performance (#316), bringing multiple optimizations:

More efficient string operations
Better collection handling
Optimized method calls
Reduced object allocations

Every little bit helps when you're streaming thousands of tokens!

🐛 Additional Fixes

Logging cleanup: Removed unnecessary "assuming model exists" debug logging after capability gatekeeping removal
Test improvements: Real API tests for token counting verification

Installation

gem 'ruby_llm', '1.6.2'

Full backward compatibility maintained. If you're using Gemini with thinking mode, this update is recommended for accurate token counting.

Merged PRs

Introduce RuboCop Performance by @tagliala in #316

New Contributors

@tagliala made their first contribution in #316

Full Changelog: 1.6.1...1.6.2

crmne/ruby_llm 1.6.2 on GitHub