RubyLLM 1.9.0: Tool Schemas, Prompt Caching & Transcriptions ✨🎙️

Major release that makes tool definitions feel like Ruby, lets you lean on Anthropic prompt caching everywhere, and turns audio transcription into a one-liner—plus better Gemini structured output and Nano Banana image responses.

🧰 JSON Schema Tooling That Feels Native

The new RubyLLM::Schema params DSL supports full JSON Schema for tool parameter definitions, including nested objects, arrays, enums, and nullable fields.

class Scheduler < RubyLLM::Tool
  description "Books a meeting"

  params do
    object :window, description: "Time window to reserve" do
      string :start, description: "ISO8601 start"
      string :finish, description: "ISO8601 finish"
    end

    array :participants, of: :string, description: "Email invitees"

    any_of :format, description: "Optional meeting format" do
      string enum: %w[virtual in_person]
      null
    end
  end

  def execute(window:, participants:, format: nil)
    Booking.reserve(window:, participants:, format:)
  end
end

Powered by RubyLLM::Schema, the same awesome Ruby DSL we recommend to use for Structured Output's chat.with_schema.
Already handles Anthropic/Gemini quirks like nullable unions and enums - no more ad-hoc translation layers.
Prefer raw hashes? Pass params schema: { ... } to keep your existing JSON Schema verbatim.

🧱 Raw Content Blocks & Anthropic Prompt Caching Everywhere

When you need to handcraft message envelopes:

chat = RubyLLM.chat(model: "claude-sonnet-4-5")
raw_request = RubyLLM::Content::Raw.new([
  { type: "text", text: File.read("prompt.md"), cache_control: { type: "ephemeral" } },
  { type: "text", text: "Summarize today’s work." }
])

chat.ask(raw_request)

We also provide an helper specifically for Anthropic Prompt Caching:

system_block = RubyLLM::Providers::Anthropic::Content.new(
  "You are our release-notes assistant.",
  cache: true
)

chat.add_message(role: :system, content: system_block)

RubyLLM::Content::Raw lets you ship provider-native payloads for content blocks.
Anthropic helpers keep cache_control hints readable while still producing the right JSON structure.
Every RubyLLM::Message now exposes cached_tokens and cache_creation_tokens, so you can see exactly what the provider pulled from cache versus what it had to recreate.

Please run rails generate ruby_llm:upgrade_to_v1_9 in your Rails app if you come from 1.8.x.

⚙️ Tool.with_params Plays Nice with Anthropic Caching

Similarly to Raw Content Blocks, .with_params lets you set arbitrary params in tool definitions. Perfect for Anthropic’s cache_control hints.

class ChangelogTool < RubyLLM::Tool
  description "Formats commits into release notes"

  params do
    array :commits, of: :string
  end

  with_params cache_control: { type: "ephemeral" }

  def execute(commits:)
    ReleaseNotes.format(commits)
  end
end

🎙️ RubyLLM.transcribe Turns Audio into Text (With Diarization)

One method call gives you transcripts, diarized segments, and consistent token tallies across providers.

transcription = RubyLLM.transcribe(
  "all-hands.m4a",
  model: "gpt-4o-transcribe-diarize",
  language: "en",
  prompt: "Focus on action items."
)

transcription.segments.each do |segment|
  puts "#{segment['speaker']}: #{segment['text']} (#{segment['start']}s – #{segment['end']}s)"
end

Supports OpenAI (whisper-1, gpt-4o-transcribe, diarization variants), Gemini 2.5 Flash, and Vertex AI with the same API.
Optional speaker references map diarized voices to real names.

🛠️ Gemini Structured Output Fixes & Nano Banana Inline Images

We went deep on Gemini’s edges so you don’t have to.

Nullables and anyOf now translate cleanly, and Gemini 2.5 finally respects responseJsonSchema, so complex structured output works out of the box.
Parallel tool calls return one single message with the right role. This should increase its accuracy in using and responding to tool calls.
Gemini 2.5 Flash Image (“Nano Banana”) surfaces inline images as actual attachments—pair it with your UI immediately.

chat = RubyLLM.chat(model: "gemini-2.5-flash-image")
reply = chat.ask("Sketch a Nano Banana wearing aviators.")
image = reply.content.attachments.first
File.binwrite("nano-banana.png", image.read)

(If you missed the backstory, my blog post Nano Banana with RubyLLM has the full walkthrough.)

🗂️ Configurable Model Registry file path

Deploying to read-only filesystems? Point RubyLLM at a writable JSON registry and keep refreshing models without hacks.

RubyLLM.models.save_to_json("/var/app/models.json")

RubyLLM.configure do |config|
  config.model_registry_file = "/var/app/models.json"
end

Just remember that RubyLLM.models.refresh! only updates the in-memory registry. To persist changes to disk, call:

RubyLLM.models.refresh!
RubyLLM.models.save_to_json

Plays nicely with the ActiveRecord integration (which still stores models in the DB).

Installation

gem "ruby_llm", "1.9.0"

Upgrading from 1.8.x

bundle update ruby_llm
rails generate ruby_llm:upgrade_to_v1_9`

Merged PRs

Feat: Support Gemini's Different API versions by @thefishua in #444

New Contributors

@thefishua made their first contribution in #444

Full Changelog: 1.8.2...1.9.0

crmne/ruby_llm 1.9.0 on GitHub