RubyLLM 1.6.4: Multimodal Tools & Better Schemas 🖼️

Maintenance release bringing multimodal tool responses, improved rake tasks, and important fixes for Gemini schema conversion. Plus better documentation and developer experience!

🖼️ Tools Can Now Return Files and Images

Tools can now return rich content with attachments, not just text! Perfect for screenshot tools, document generators, and visual analyzers:

class ScreenshotTool < RubyLLM::Tool
  description "Takes a screenshot and returns it"
  param :url, desc: "URL to screenshot"
  
  def execute(url:)
    screenshot_path = capture_screenshot(url)  # Your screenshot logic
    
    # Return a Content object with text and attachments
    RubyLLM::Content.new(
      "Screenshot of #{url} captured successfully",
      [screenshot_path]  # Can be file path, StringIO, or ActiveStorage blob
    )
  end
end

# The LLM can now see and analyze the screenshot
chat = RubyLLM.chat.with_tool(ScreenshotTool)
response = chat.ask("Take a screenshot of ruby-lang.org and describe what you see")

This opens up powerful workflows:

Visual debugging: Screenshot tools that capture and analyze UI states
Document generation: Tools that create PDFs and return them for review
Data visualization: Generate charts and have the LLM interpret them
Multi-step workflows: Chain tools that produce and consume visual content

Works with all providers that support multimodal content.

🔧 Fixed: Gemini Schema Conversion

Gemini's structured output was not preserving all the schema fields and integer schemas were converted to number. Now the conversion logic correctly handles:

# Preserve description
schema = {
  type: 'object',
  description: 'An object',
  properties: {
    example: {
      type: "string",
      description: "a brief description about the person's time at the conference"
    }
  },
  required: ['example']
}

# Define schema with both number and integer types
schema = {
  type: 'object',
  properties: {
    number1: {
      type: 'number',
    },
    number2: {
      type: 'integer',
    }
  }
}

Also added tests to cover simple and complex schemas, nested objects and arrays, all constraint attributes, nullable fields, descriptions, property ordering for objects.

Fixes #354, closes #355.

Thanks to @BrianBorge for reporting and working on the initial PR.

🛠️ Developer Experience: Improved Rake Tasks

Consolidated Model Management

All model-related tasks are now streamlined and better organized:

# Default task now runs overcommit hooks + model updates
bundle exec rake

# Update models, generate docs, and create aliases in one command
bundle exec rake models

# Individual tasks still available
bundle exec rake models:update    # Fetch latest models from providers
bundle exec rake models:docs      # Generate model documentation
bundle exec rake models:aliases   # Generate model aliases

The tasks have been refactored from 3 separate files into a single, well-organized models.rake file following Rails conventions.

Release Preparation

New comprehensive release preparation task:

# Prepare for release: refresh cassettes, run hooks, update models
bundle exec rake release:prepare

This task:

Automatically refreshes stale VCR cassettes (>1 day old)
Runs overcommit hooks for code quality
Updates models, docs, and aliases
Ensures everything is ready for a clean release

Cassette Management

# Verify cassettes are fresh
bundle exec rake release:verify_cassettes

# Refresh stale cassettes automatically
bundle exec rake release:refresh_stale_cassettes

📚 Documentation Updates

Redirect fix: /installation now properly redirects to /getting-started
Badge refresh: README badges updated to bust GitHub's cache
Async pattern fix: Corrected supervisor pattern example in agentic workflows guide to avoid "Cannot wait on own fiber!" errors

🧹 Additional Updates

Appraisal gemfiles updated: All Rails version test matrices refreshed
Test coverage: New specs for multimodal tool responses
Provider compatibility: Verified with latest API versions

Installation

gem 'ruby_llm', '1.6.4'

Full backward compatibility maintained. The multimodal tool support is opt-in - existing tools continue working as before.

Full Changelog: 1.6.3...1.6.4

crmne/ruby_llm 1.6.4 on GitHub