⭐️ Highlights
-
Added two new retriever components:
MultiRetrieverandTextEmbeddingRetriever.MultiRetrieveris marked as experimental and may change or be removed in future releases without prior deprecation notice. AnExperimentalWarningis printed when initializing this component.MultiRetrievercombines multiple text retrievers into a single component. All text retrievers are queried in parallel and their results are deduplicated before being returned. Use theactive_retrieversparameter to enable or disable specific retrievers at runtime.TextEmbeddingRetrieverwraps an embedding-based retriever together with a text embedder into a single component that implements theTextRetrieverprotocol, making it compatible withMultiRetriever.
🚀 New Features
- Add
run_asynctoCacheChecker, enabling it to be used inAsyncPipelinewithout blocking the event loop.
⚡️ Enhancement Notes
- Document the input ordering behavior of auto-promoted lazy variadic sockets in
Pipeline.connect(). When multiple senders are connected to the same list-typed receiver socket, ordering depends on the pipeline class. WithPipeline, items are ordered alphabetically by sender component name (becausePipeline.run()schedules components in alphabetical order for deterministic execution), not by the order ofconnect()calls. WithAsyncPipeline, no ordering is guaranteed, since components in different branches may run in parallel. The docstrings now point users to a dedicated joiner component when they need explicit ordering. - Add
join_modeparameter to the experimentalMultiRetrievercomponent, supporting"reciprocal_rank_fusion"(default) and"concatenate". Reciprocal Rank Fusion merges the ranked result lists from all retrievers into a single deduplicated list ordered by RRF score. The underlying RRF logic is extracted into a shared utility_reciprocal_rank_fusioninhaystack.utils.misc, which is now also used byDocumentJoiner.
🐛 Bug Fixes
- Fixed a bug in
NamedEntityExtractorwhere the spaCy/Thinc device state was not correctly restored after execution, potentially affecting the device configuration of other spaCy components in the same process. - Preserve resumable snapshots when some inputs or outputs are non-serializable. Haystack now omits only the failing top-level fields (for example non-serializable callbacks or runtime objects) instead of replacing the whole payload with an empty dictionary. This applies both to agent sub-component inputs (
chat_generatorandtool_invoker) and to pipeline-levelinputs,original_input_data, andpipeline_outputscaptured by_create_pipeline_snapshot. When every field fails to serialize, the snapshot still stores a structurally valid empty payload ({"serialization_schema": {"type": "object", "properties": {}}, "serialized_data": {}}) so that resuming the snapshot does not raiseDeserializationError— for example when resuming from aToolBreakpointwhere the sub-component's inputs are not strictly required. - Fixed
tools_strict=TrueinOpenAIChatGeneratorto recursively applyadditionalProperties: falseandrequiredto all nested objects in tool parameter schemas. Previously only the top-level object was transformed, causing OpenAI's strict mode to reject tools with nested parameters.
💙 Big thank you to everyone who contributed to this release!
@Aftabbs, @albertodiazdurana, @anakin87, @ArkaD171717, @bilgeyucel, @bogdankostic, @davidsbatista, @FuturMix, @julian-risch, @kacperlukawski, @ritikraj2425, @saivedant169, @shaun0927, @sjrl, @SyedShahmeerAli12