github crmne/ruby_llm 1.16.0

10 hours ago

RubyLLM 1.16: Concurrent Tool Execution + Rails-style Instrumentation + api_base support for all providers + a deluge of fixes

RubyLLM 1.16 makes your tools run concurrently in threads or fibers, makes RubyLLM observable without monkey patching, and lets every native provider sit behind a proxy.

Concurrent Tool Execution

RubyLLM.Concurrent.Tool.Executions.mp4

When a model returns multiple tool calls in one response, RubyLLM has always run them one at a time. Incredibly useful for I/O-bound tools like HTTP calls, database lookups, other LLM requests.

Turn it on for every chat from one place:

RubyLLM.configure do |config|
  config.tool_concurrency = true # :threads, :fibers, true, or false
end

true uses Ruby threads and needs no extra dependencies. :fibers mode uses the optional async gem.

You can also override it per PORO or Rails chat record, when a particular conversation needs different behaviour:

chat.with_tools(Weather, StockPrice, Currency, concurrency: true)
chat.with_tools(Weather, StockPrice, Currency, concurrency: :threads)
chat.with_tools(Weather, StockPrice, Currency, concurrency: :fibers)
chat.with_tools(Weather, StockPrice, concurrency: false)
chat_record.with_tools(Weather, StockPrice, concurrency: :threads)

Inside Rails, each concurrent tool call runs wrapped in the Rails executor, so connection pools, CurrentAttributes, and reloading behave the way the rest of your app expects.

Streaming Results as They Finish

Concurrency doesn't make you wait for the slowest tool to start showing progress. Each tool result is added back to the conversation the moment that tool finishes, in completion order. RubyLLM still waits for every result before asking the model for its next response, but your callbacks and streaming UI see results stream in as they land instead of all at once at the end.

This way, simply adding

RubyLLM.configure do |config|
  config.tool_concurrency = :fibers # or :threads
end

gives you and your users the best performance and user experience.

Rails-Style Instrumentation

RubyLLM now emits structured events for the work it does. No specific observability backend required.

In Rails, events flow through ActiveSupport::Notifications automatically. Subscribe the same way you'd subscribe to any framework event:

# config/initializers/ruby_llm_instrumentation.rb
ActiveSupport::Notifications.subscribe('chat.ruby_llm') do |_name, _start, _finish, _id, payload|
  Rails.logger.info(
    provider: payload[:provider],
    model: payload[:model],
    input_tokens: payload[:input_tokens],
    output_tokens: payload[:output_tokens]
  )
end

Outside Rails, point config.instrumenter at any object that responds to instrument(name, payload) { ... }. Wire it into OpenTelemetry, StatsD, or your own logger:

RubyLLM.configure do |config|
  config.instrumenter = AppInstrumenter.new
end

The events RubyLLM emits:

  • request.ruby_llm: HTTP request metadata: provider, method, URL, status
  • chat.ruby_llm: chat completion metadata: model, provider, messages, response, token usage
  • tool_call.ruby_llm: tool name, arguments, result
  • embedding.ruby_llm: embedding model, input, result, token usage, vector dimensions
  • models.refresh.ruby_llm: model registry refresh metadata

Payloads carry the Ruby objects observability adapters need. Message content, tool arguments, and provider responses can be sensitive, so export or log those fields only when your application policy allows it. See the new Instrumentation guide for the full payload reference.

Custom Base URLs for Every Native Provider

Need to route a native provider API through a proxy, a gateway, or a private network endpoint? Every provider now has a configurable base URL. This release fills the remaining gaps:

RubyLLM.configure do |config|
  config.bedrock_api_base     = ENV['BEDROCK_API_BASE']     # new in v1.16
  config.mistral_api_base     = ENV['MISTRAL_API_BASE']     # new in v1.16
  config.perplexity_api_base  = ENV['PERPLEXITY_API_BASE']  # new in v1.16
  config.vertexai_api_base    = ENV['VERTEXAI_API_BASE']    # new in v1.16
  config.xai_api_base         = ENV['XAI_API_BASE']         # new in v1.16
end

Together with the bases already available for OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter, Azure, Ollama, and GPUStack, you can now front any provider with custom infrastructure. Each override falls back to the provider's default endpoint when unset, so existing configuration keeps working untouched.

Configurable Faraday Adapter

RubyLLM builds its connections on Faraday, and now you choose the adapter:

RubyLLM.configure do |config|
  config.faraday_adapter = :async_http # or :typhoeus, :net_http, :httpx, etc.
end

Defaults to Faraday.default_adapter, which is Net::HTTP, so nothing changes unless you ask for something else. Reach for this when you want connection pooling, HTTP/2, or an adapter your app already standardizes on.

Deprecation Controls

RubyLLM's deprecation warnings are now yours to manage.

RubyLLM.configure do |config|
  config.deprecation_behavior = :warn # :warn (default), :silence, or :raise
end

:warn logs through RubyLLM.logger. :silence quiets the warnings. :raise turns them into RubyLLM::DeprecationError, perfect for test suites that want to fail the moment a deprecated path is hit, so you're ready before those paths are removed in RubyLLM 2.0.

Transcription Words

Transcription now exposes word-level data when the provider returns it, so you can build word-by-word timing and highlighting on top of OpenAI's verbose transcription responses:

transcription = RubyLLM.transcribe("interview.mp3", model: "whisper-1")
transcription.words # => [{ word:, start:, end: }, ...]

Fixes

1.16 also introduces a broad round of provider and Rails fixes:

  • Anthropic context-length errors: "prompt is too long" responses are now classified as ContextLengthExceededError instead of a generic error, so you can rescue them properly. Thanks @yasming (#769).
  • Anthropic streaming parallel tool calls: fixed accumulation of multiple tool calls in a single streamed response.
  • Anthropic empty content: tightened the empty-content checks so valid requests aren't rejected.
  • Anthropic system message warning: removed the spurious "single system message" warning. Thanks @timherby (#768).
  • Bedrock thinking streaming: fixed streamed reasoning output from Bedrock models.
  • GPUStack / Ollama thinking: fixed a thinking-token handling bug.
  • Gemini function calls: responses now adhere to Gemini's function-call spec, and inline image responses persist correctly.
  • OpenRouter: nil chat responses are now handled gracefully.
  • Model pricing serialization: chat-created model pricing serializes as hashes. Thanks @cyphercodes (#796).
  • Blank configuration values: blank config values are normalized so empty env vars don't produce surprising behavior.
  • Rails / Active Storage: pending Active Storage uploads are handled in chat history, optional Active Storage constants are guarded against load-order issues, and text attachments stay text instead of being re-encoded.
  • Provider documents: added support for provider-specific document attachments.
  • Clearer errors: model-lookup and provider-configuration errors now explain what went wrong and how to fix it.

Model Registry + Provider Cleanup

The model registry has been refreshed with the latest models, capabilities, and pricing.

models.dev started shipping partial release dates (month-only like 2025-09, year-only like 2025). RubyLLM was appending 00:00:00 UTC to whatever it got, which turned those into invalid timestamps and broke model loading. 1.16 normalizes partial dates to real dates (2025-092025-09-01) so the registry keeps loading. Fixes #798.

I also cleaned up provider metadata: Anthropic now uses the adaptive thinking format newer models require (#782), DeepSeek structured output works again (#766), xAI aliases are more complete, and Bedrock model matching is more accurate.

Installation

gem "ruby_llm", "1.16"

Upgrading from 1.15.x

bundle update ruby_llm

This release is backwards compatible. Concurrent tool execution is opt-in, instrumentation is inert until you subscribe or set an instrumenter, and every new *_api_base falls back to the provider default. If you want to start catching deprecated paths early, set config.deprecation_behavior = :raise in your test environment.

Merged PRs

  • Remove spurious 'single system message' warning from Anthropic provider by @timherby in #768
  • Serialize chat-created model pricing as hashes by @cyphercodes in #796
  • [BUG] Anthropic "prompt is too long" error not classified as ContextLengthExceededError by @yasming in #769

New Contributors

Full Changelog: 1.15.0...1.16.0

Don't miss a new ruby_llm release

NewReleases is sending notifications on new releases.