Release Notes

v2.21.0-rc1

Updated the default Azure OpenAI model from gpt-4o-mini to gpt-4.1-mini and the default API version from 2023-05-15 to 2024-12-01-preview for both AzureOpenAIGenerator and AzureOpenAIChatGenerator.
The default OpenAI model has been changed from gpt-4o-mini to gpt-5-mini for OpenAIChatGenerator and OpenAIGenerator. If you rely on the default model and need to continue using gpt-4o-mini, explicitly specify it when initializing these components: OpenAIChatGenerator(model="gpt-4o-mini").

Three new components added QueryExpander, MultiQueryEmbeddingRetriever, MultiQueryTextRetriever. When used together, they allow a query to be expanded and each expansion is used to retrieve a potentially different set of documents.

Added a return_empty_on_no_match parameter (default True) to RegexTextExtractor.__init__(). When set to False, the component returns {"captured_text": ""} instead of {} when no regex match is found. Provides a consistent output structure for pipeline integration.
The FilterRetriever and AutoMergingRetriever components now support asynchronous execution.
Previously, when using tracing with objects like ByteStream and ImageContent, the payload sent to the tracing backend could become too large, hitting provider limits or causing performance degradation. We now replace these objects with string placeholders to avoid oversized payloads.
The default OpenAI model for OpenAIChatGenerator and OpenAIGenerator has been updated from gpt-4o-mini to gpt-5-mini.

Ensure request header keys are unique in link_content to prevent 400 Bad Request errors.
Some image providers return a 400 Bad Request when using ImageContent.from_url() because the User-Agent header appears multiple times with different casing (e.g., user-agent, User-Agent). This update normalizes header keys in a case-insensitive way, removes duplicates, and preserves only the last occurrence.
Fixed a bug where components explicitly listed in include_outputs_from would not appear in the pipeline results if they returned an empty dictionary. Now, any component specified in include_outputs_from will be included in the results regardless of whether its output is empty.
Fix the serialization and deserialization of pipeline_outputs in pipeline_snapshot and make it use the same schema as the rest of the pipeline state when running pipelines with breakpoints. The deserialization of the older format of pipeline_outputs without serialization schema is supported till Haystack 2.23.0.
Fixed ToolInvoker missing tools after warmup for lazy-initialized toolsets. The invoker now refreshes its tool registry post-warmup, ensuring replaced placeholders (e.g., MCPToolset with eager_connect=False) resolve to the actual tool names at invocation time.