github deepset-ai/haystack v2.29.0-rc1

pre-release2 hours ago

⭐️ Highlights

  • Added two new retriever components: MultiRetriever and TextEmbeddingRetriever.

    MultiRetriever is marked as experimental and may change or be removed in future releases without prior deprecation notice. An ExperimentalWarning is printed when initializing this component.

    MultiRetriever combines multiple text retrievers into a single component. All text retrievers are queried in parallel and their results are deduplicated before being returned. Use the active_retrievers parameter to enable or disable specific retrievers at runtime.

    TextEmbeddingRetriever wraps an embedding-based retriever together with a text embedder into a single component that implements the TextRetriever protocol, making it compatible with MultiRetriever.

🚀 New Features

  • Add run_async to CacheChecker, enabling it to be used in AsyncPipeline without blocking the event loop.

⚡️ Enhancement Notes

  • Document the input ordering behavior of auto-promoted lazy variadic sockets in Pipeline.connect(). When multiple senders are connected to the same list-typed receiver socket, ordering depends on the pipeline class. With Pipeline, items are ordered alphabetically by sender component name (because Pipeline.run() schedules components in alphabetical order for deterministic execution), not by the order of connect() calls. With AsyncPipeline, no ordering is guaranteed, since components in different branches may run in parallel. The docstrings now point users to a dedicated joiner component when they need explicit ordering.
  • Add join_mode parameter to the experimental MultiRetriever component, supporting "reciprocal_rank_fusion" (default) and "concatenate". Reciprocal Rank Fusion merges the ranked result lists from all retrievers into a single deduplicated list ordered by RRF score. The underlying RRF logic is extracted into a shared utility _reciprocal_rank_fusion in haystack.utils.misc, which is now also used by DocumentJoiner.

🐛 Bug Fixes

  • Fixed a bug in NamedEntityExtractor where the spaCy/Thinc device state was not correctly restored after execution, potentially affecting the device configuration of other spaCy components in the same process.
  • Preserve resumable snapshots when some inputs or outputs are non-serializable. Haystack now omits only the failing top-level fields (for example non-serializable callbacks or runtime objects) instead of replacing the whole payload with an empty dictionary. This applies both to agent sub-component inputs (chat_generator and tool_invoker) and to pipeline-level inputs, original_input_data, and pipeline_outputs captured by _create_pipeline_snapshot. When every field fails to serialize, the snapshot still stores a structurally valid empty payload ({"serialization_schema": {"type": "object", "properties": {}}, "serialized_data": {}}) so that resuming the snapshot does not raise DeserializationError — for example when resuming from a ToolBreakpoint where the sub-component's inputs are not strictly required.
  • Fixed tools_strict=True in OpenAIChatGenerator to recursively apply additionalProperties: false and required to all nested objects in tool parameter schemas. Previously only the top-level object was transformed, causing OpenAI's strict mode to reject tools with nested parameters.

💙 Big thank you to everyone who contributed to this release!

@Aftabbs, @albertodiazdurana, @anakin87, @ArkaD171717, @bilgeyucel, @bogdankostic, @davidsbatista, @FuturMix, @julian-risch, @kacperlukawski, @ritikraj2425, @saivedant169, @shaun0927, @sjrl, @SyedShahmeerAli12

Don't miss a new haystack release

NewReleases is sending notifications on new releases.