RubyLLM 1.6.2: Thinking Tokens & Performance 🧠
Quick maintenance release fixing Gemini's thinking token counting and bringing performance improvements. Plus we're removing capability gatekeeping - trust providers to know what they can do!
🧮 Fixed: Gemini Thinking Token Counting
Gemini 2.5 with thinking mode wasn't counting tokens correctly, leading to incorrect billing calculations:
# Before: Only counted candidatesTokenCount (109 tokens)
# Actual API response had:
# candidatesTokenCount: 109
# thoughtsTokenCount: 443
# => Should be 552 total!
# Now: Correctly sums both token types
chat = RubyLLM.chat(model: 'gemini-2.5-flash')
response = chat.ask('What is 2+2? Think step by step.')
response.output_tokens # => 552 (correctly summed)
This aligns with how all providers bill thinking/reasoning tokens - they're all output tokens. Fixes #346.
🚫 Capability Gatekeeping Removed
We were pre-checking if models support certain features before attempting to use them. But sometimes pre-emptive checks were getting in the way:
# Before 1.6.2: Pre-checked capabilities before attempting
chat.with_tool(MyTool) # => UnsupportedFunctionsError (without trying)
# Now: Let the provider handle it
chat.with_tool(MyTool) # Works if supported, provider errors if not
Why this approach is better:
- Direct feedback - Get the actual provider error, not our pre-emptive block
- Immediate support - New models and features work as soon as providers ship them
- Custom models - Fine-tuned and custom models aren't artificially limited
- Simpler flow - One less layer of validation between you and the provider
The provider knows what it can do. If it works, great! If not, you'll get a clear error from the source.
Same philosophy applies to structured output (with_schema
).
⚡ Performance Improvements
Thanks to @tagliala for introducing RuboCop Performance (#316), bringing multiple optimizations:
- More efficient string operations
- Better collection handling
- Optimized method calls
- Reduced object allocations
Every little bit helps when you're streaming thousands of tokens!
🐛 Additional Fixes
- Logging cleanup: Removed unnecessary "assuming model exists" debug logging after capability gatekeeping removal
- Test improvements: Real API tests for token counting verification
Installation
gem 'ruby_llm', '1.6.2'
Full backward compatibility maintained. If you're using Gemini with thinking mode, this update is recommended for accurate token counting.
Merged PRs
New Contributors
Full Changelog: 1.6.1...1.6.2