Release Notes
v2.21.0-rc1
Upgrade Notes
- Updated the default Azure OpenAI model from
gpt-4o-minitogpt-4.1-miniand the default API version from2023-05-15to2024-12-01-previewfor bothAzureOpenAIGeneratorandAzureOpenAIChatGenerator. - The default OpenAI model has been changed from gpt-4o-mini to gpt-5-mini for OpenAIChatGenerator and OpenAIGenerator. If you rely on the default model and need to continue using gpt-4o-mini, explicitly specify it when initializing these components: OpenAIChatGenerator(model="gpt-4o-mini").
New Features
- Three new components added
QueryExpander,MultiQueryEmbeddingRetriever,MultiQueryTextRetriever. When used together, they allow a query to be expanded and each expansion is used to retrieve a potentially different set of documents.
Enhancement Notes
- Added a return_empty_on_no_match parameter (default True) to RegexTextExtractor.__init__(). When set to False, the component returns {"captured_text": ""} instead of {} when no regex match is found. Provides a consistent output structure for pipeline integration.
- The FilterRetriever and AutoMergingRetriever components now support asynchronous execution.
- Previously, when using tracing with objects like
ByteStreamandImageContent, the payload sent to the tracing backend could become too large, hitting provider limits or causing performance degradation. We now replace these objects with string placeholders to avoid oversized payloads. - The default OpenAI model for OpenAIChatGenerator and OpenAIGenerator has been updated from gpt-4o-mini to gpt-5-mini.
Bug Fixes
-
Ensure request header keys are unique in link_content to prevent 400 Bad Request errors.
Some image providers return a 400 Bad Request when using ImageContent.from_url() because the User-Agent header appears multiple times with different casing (e.g., user-agent, User-Agent). This update normalizes header keys in a case-insensitive way, removes duplicates, and preserves only the last occurrence.
-
Fixed a bug where components explicitly listed in include_outputs_from would not appear in the pipeline results if they returned an empty dictionary. Now, any component specified in include_outputs_from will be included in the results regardless of whether its output is empty.
-
Fix the serialization and deserialization of
pipeline_outputsinpipeline_snapshotand make it use the same schema as the rest of the pipeline state when running pipelines with breakpoints. The deserialization of the older format ofpipeline_outputswithout serialization schema is supported till Haystack 2.23.0. -
Fixed ToolInvoker missing tools after warmup for lazy-initialized toolsets. The invoker now refreshes its tool registry post-warmup, ensuring replaced placeholders (e.g., MCPToolset with eager_connect=False) resolve to the actual tool names at invocation time.
💙 Big thank you to everyone who contributed to this release!
@Amnah199, @anakin87, @davidsbatista, @dfokina, @mrchtr, @OscarPindaro, @schwartzadev, @sjrl, @TaMaN2031A, @vblagoje, @YassineGabsi, @ZeJ0hn