github deepset-ai/haystack v2.7.0-rc1

pre-release11 hours ago

Release Notes

v2.7.0-rc1

New Features

  • Added component StringJoiner to join strings from different components to a list of strings.

v2.8.0-rc0

Highlights

With the new Logging Tracer, users can inspect in the logs everything that is happening in their Pipelines in real time. This feature aims to improve the user experience during experimentation and prototyping.

Pipeline.run() internal logic has been heavily reworked to be more robust and reliable than before. This new implementation makes it easier to run `Pipeline`s that have cycles in their graph. It also fixes some corner cases in `Pipeline`s that don't have any cycle.

Upgrade Notes

  • Removed Pipeline init argument debug_path. We do not support this anymore.

  • Removed Pipeline init argument max_loops_allowed. Use max_runs_per_component instead.

  • Removed PipelineMaxLoops exception. Use PipelineMaxComponentRuns instead.

  • The deprecated default converter class haystack.components.converters.pypdf.DefaultConverter used by PyPDFToDocument has been removed.

    Pipeline YAMLs from haystack<2.7.0 that use the default converter must be updated in the following manner: `yaml # Old components: Comp1: init_parameters: converter: type: haystack.components.converters.pypdf.DefaultConverter type: haystack.components.converters.pypdf.PyPDFToDocument # New components: Comp1: init_parameters: converter: null type: haystack.components.converters.pdf.PDFToTextConverter`

    Pipeline YAMLs from haystack<2.7.0 that use custom converter classes can be upgraded by simply loading them with haystack==2.6.x and saving them to YAML again.

  • Pipeline.connect() will now raise a PipelineConnectError if sender and receiver are the same Component. We do not support this use case anymore.

New Features

  • Added a new component DocumentNDCGEvaluator, which is similar to DocumentMRREvaluator and useful for retrieval evaluation. It calculates the normalized discounted cumulative gain, an evaluation metric useful when there are multiple ground truth relevant documents and the order in which they are retrieved is important.

  • Improved serialization/deserialization errors to provide extra context about the delinquent components when possible.

  • Enhanced DOCX converter to support table extraction in addition to paragraph content. The converter supports both CSV and Markdown table formats, providing flexible options for representing tabular data extracted from DOCX documents.

  • Added a new parameter additional_mimetypes to the FileTypeRouter component.

    This allows users to specify additional MIME type mappings, ensuring correct

    file classification across different runtime environments and Python versions.

  • Introduce a Logging Tracer, that sends all traces to the logs.

    It can enabled as follows: `python import logging from haystack import tracing from haystack.tracing.logging_tracer import LoggingTracer logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) logging.getLogger("haystack").setLevel(logging.DEBUG) tracing.tracer.is_content_tracing_enabled = True # to enable tracing/logging content (inputs/outputs) tracing.enable_tracing(LoggingTracer())`

  • Fundamentally rework the internal logic of Pipeline.run(). The rework makes it more reliable and covers more use cases. We fixed some issues that made `Pipeline`s with cycles unpredictable and with unclear Components execution order.

  • Each tracing span of a component run is now attached with the pipeline run span object. This allows users to trace the execution of multiple pipeline runs concurrently.

Enhancement Notes

  • Add streaming_callback run parameter to HuggingFaceAPIGenerator and HuggingFaceLocalGenerator to allow users to pass a callback function that will be called after each chunk of the response is generated.
  • The SentenceWindowRetriever now supports the window_size parameter at run time, overwriting the value set in the constructor.
  • Add output type validation in ConditionalRouter. Setting validate_output_type to True will enable a check to verify if the actual output of a route returns the declared type. If it doesn't match a ValueError is raised.
  • Reduced numpy usage to speed up imports.
  • Improved file type detection in FileTypeRouter, particularly for Microsoft Office file formats like .docx and .pptx. This enhancement ensures more consistent behavior across different environments, including AWS Lambda functions and systems without pre-installed office suites.
  • The FiletypeRouter now supports passing metadata (meta) in the run method. When metadata is provided, the sources are internally converted to ByteStream objects and the metadata is added. This new parameter simplifies working with preprocessing/indexing pipelines.
  • SentenceTransformersDocumentEmbedder now supports config_kwargs for additional parameters when loading the model configuration
  • SentenceTransformersTextEmbedder now supports config_kwargs for additional parameters when loading the model configuration
  • Previously, numpy was pinned to <2.0 to avoid compatibility issues in several core integrations. This pin has been removed, and haystack can work with both numpy 1.x and 2.x. If necessary, we will pin numpy version in specific core integrations that require it.
  • Upgrade Hatch to 1.13.0 and adopt uv as installer, to speed up the CI.

Deprecation Notes

  • The DefaultConverter class used by the PyPDFToDocument component has been deprecated. Its functionality will be merged into the component in 2.7.0.

Bug Fixes

  • Adjusted a test on HuggingFaceAPIGenerator to ensure compatibility with the huggingface_hub==0.26.0.
  • Serialized data of components are now explicitly enforced to be one of the following basic Python datatypes: str, int, float, bool, list, dict, set, tuple or None.
  • Addressed an issue where certain file types (e.g., .docx, .pptx) were incorrectly classified as 'unclassified' in environments with limited MIME type definitions, such as AWS Lambda functions.
  • Fixes logs containing JSON data getting lost due to string interpolation.
  • Use forward references for Hugging Face Hub types in the HuggingFaceAPIGenerator component to prevent import errors.
  • Add pip to test dependencies: mypy needs it to install missing stub packages.
  • Fix the serialization of PyPDFToDocument component to prevent the default converter from being serialized unnecessarily.
  • Revert change to PyPDFConverter that broke the deserialization of pre 2.6.0 YAMLs.

Don't miss a new haystack release

NewReleases is sending notifications on new releases.