github deepset-ai/haystack v2.0.0-beta.5

latest releases: v2.2.4, v2.2.4-rc1, v1.26.3-rc1...
pre-release5 months ago

Release Notes

v2.0.0-beta.5

⬆️ Upgrade Notes

  • Implement framework-agnostic device representations. The main impetus behind this change is to move away from stringified representations of devices that are not portable between different frameworks. It also enables support for multi-device inference in a generic manner.

    Going forward, components can expose a single, optional device parameter in their constructor (Optional[ComponentDevice]):

import haystack.utils import ComponentDevice, Device, DeviceMap 

class MyComponent(Component):     
	def __init__(self, device: Optional[ComponentDevice] = None):         
		# If device is None, automatically select a device.         
		self.device = ComponentDevice.resolve_device(device)
    
	def warm_up(self):         
		# Call the framework-specific conversion method.         
		self.model = AutoModel.from_pretrained("deepset/bert-base-cased-squad2", device=self.device.to_hf())  

# Automatically selects a device. 
c = MyComponent(device=None) 
# Uses the first GPU available. 
c = MyComponent(device=ComponentDevice.from_str("cuda:0")) 
# Uses the CPU. 
c = MyComponent(device=ComponentDevice.from_single(Device.cpu())) 
# Allow the component to use multiple devices using a device map. 
c = MyComponent(device=ComponentDevice.from_multiple(
  DeviceMap({       
    "layer1": Device.cpu(),       
    "layer2": Device.gpu(1),       
    "layer3": Device.disk() 
  })
))
  • Change any occurrence of:
    from haystack.components.routers.document_joiner import DocumentJoiner

    to:
    from haystack.components.joiners.document_joiner import DocumentJoiner

  • Change the imports for in_memory document store and retrievers from:

    from haystack.document_stores import InMemoryDocumentStore from haystack.components.retrievers import InMemoryEmbeddingRetriever

    to:

    from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.retrievers.in_memory import InMemoryBM25Retriever

  • Rename the transcriber parametersmodel_name and model_name_or_path to model. This change affects both LocalWhisperTranscriber and RemoteWhisperTranscriber classes.

  • Rename the embedder parameters model_name and model_name_or_path tomodel. This change affects all Embedder classes.

  • Rename model_name_or_path to model in NamedEntityExtractor.

  • Rename model_name_or_path to model in TransformersSimilarityRanker.

  • Rename parametermodel_name_or_pathtomodel inExtractiveReader.

  • Rename the generator parameters model_name and model_name_or_path to model. This change affects all Generator classes.

🚀 New Features

  • Adds calculate_metrics() function to EvaluationResult for computation of evaluation metrics. Adds Metric class to store list of available metrics. Adds MetricsResult class to store the metric values computed during the evaluation.

  • Added a new extractor component, namely NamedEntityExtractor. This component accepts a list of Documents as its input - the raw text in the documents are annotated by the extractor and the annotations are stored in the document's meta dictionary (under the key named_entities).
    The component is designed to support multiple NER backends, and the current implementations support two at the moment: Hugging Face and spaCy. These two backends implement support for any HF/spaCy model that supports token classification/NER respectively.

  • Add `component.set_input_type() function to set a Component input name, type and default value.

  • Adds support for single metadata dictionary input in MarkdownToDocument.

  • Adds support for single metadata dictionary input in TikaDocumentConverter.

⚡️ Enhancement Notes

  • Add a field called default_value to the InputSocket dataclass. Deriveis_mandatory value from the presence of default_value.
  • Added split_by "page" to DocumentSplitter, which will split the document at "\f"
  • Modify the output type of CacheChecker from List[Any] to Listto make it possible to connect it in a Pipeline.
  • Highlight optional connections in thePipeline.draw() output.
  • Improve the URLCacheChecker so that it can work with any type of data in the DocumentStore, not just URL caching. Rename the component to CacheChecker.
  • Prevent the MetaFieldRanker from throwing an error if one or more of the documents doesn't contain the specific meta data field. Now those documents will be ignored for ranking purposes and placed at the end of the ranked list so we don't completely throw them away. Adding a sort_order that can have values of descending or ascending. Added more runtime parameters.
  • Create a new package called joiners and move DocumentJoiner there for clarity.
  • Stop exposing in_memory package symbols in the haystack.document_store and <shaystack.components.retrievers root namespaces.
  • Add example script about how to use Multiplexer to route meta to file converters.
  • Adds support for single metadata dictionary input in AzureOCRDocumentConverter. In this way, additional metadata can be added to all files processed by this component even when the length of the list of sources is unknown.

🐛 Bug Fixes

  • Fix ComponentMeta ignoring keyword-only parameters in the run method. ComponentMeta.__call__ handles the creation of InputSockets for the component's inputs when the latter has not explicitly called _Component.set_input_types(). This logic was not correctly handling keyword-only parameters.
  • Fixes the error descriptor '__dict__' for 'ComponentClassX' objects doesn't apply to a 'ComponentClassX' object when calling dir() on a component instance. This fix should allow auto-completion in code editors.
  • Prevent InMemoryBM25Retriever from returning documents with a score of 0.0.
  • Fix pytest breaking in VSCode due to a name collision in the RAG pipeline tests.
  • Correctly handle the serialization and deserialization of torch.dtype. This concerns the following components: ExtractiveReader, HuggingFaceLocalGenerator, and TransformersSimilarityRanker.

Don't miss a new haystack release

NewReleases is sending notifications on new releases.