github deepset-ai/haystack v1.3.0

latest releases: v2.1.2, v2.1.2-rc1, v2.1.2-rc0...
2 years ago

⭐ Highlights

Pipeline YAML Syntax Validation

The syntax of pipeline configurations as defined in YAML files can now be validated. If the validation fails, erroneous components/parameters are identified to make it simple to fix them. Here is a code snippet to manually validate a file:

from pathlib import Path
from haystack.pipelines.config import validate_yaml
validate_yaml(Path("rest_api/pipeline/pipelines.haystack-pipeline.yml"))

Your IDE can also take care of the validation when you edit a pipeline YAML file. The suffix *.haystack-pipeline.yml tells your IDE that this YAML contains a Haystack pipeline configuration and enables some checks and autocompletion features if the IDE is configured that way (YAML plugin for VSCode, Configuration Guide for PyCharm). The schema used for validation can be found in SchemaStore pointing to the schema files for the different Haystack versions. Note that an update of the Haystack version might sometimes require to do small changes to the pipeline YAML files. You can set version: 'unstable' in the pipeline YAML to circumvent the validation or set it to the latest Haystack version if the components and parameters that you use are compatible with the latest version. #2226

Pinecone DocumentStore

We added another DocumentStore to Haystack: PineconeDocumentStore! 🎉 Pinecone is a fully managed service for very large scale dense retrieval. To this end, embeddings and metadata are stored in a hosted Pinecone vector database while the document content is stored in a local SQL database. This separation simplifies infrastructure setup and maintenance. In order to use this new document store, all you need is an API key, which you can obtain by creating an account on the Pinecone website. #2254

import os
from haystack.document_stores import PineconeDocumentStore
document_store = PineconeDocumentStore(api_key=os.environ["PINECONE_API_KEY"])

BEIR Integration

Fresh from the 🍻 cellar, Haystack now has an integration with our favorite BEnchmarking Information Retrieval tool BEIR. It contains preprocessed datasets for zero-shot evaluation of retrieval models in 17 different languages, which you can use to benchmark your pipelines. For example, a DocumentSearchPipeline can now be evaluated by calling Pipeline.eval_beir() after having installed Haystack with the BEIR dependency via pip install farm-haystack[beir]. Cheers! #2333

from haystack.pipelines import DocumentSearchPipeline, Pipeline
from haystack.nodes import TextConverter, ElasticsearchRetriever
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore

text_converter = TextConverter()
document_store = ElasticsearchDocumentStore(search_fields=["content", "name"], index="scifact_beir")
retriever = ElasticsearchRetriever(document_store=document_store, top_k=1000)

index_pipeline = Pipeline()
index_pipeline.add_node(text_converter, name="TextConverter", inputs=["File"])
index_pipeline.add_node(document_store, name="DocumentStore", inputs=["TextConverter"])

query_pipeline = DocumentSearchPipeline(retriever=retriever)

ndcg, _map, recall, precision = Pipeline.eval_beir(
    index_pipeline=index_pipeline, query_pipeline=query_pipeline, dataset="scifact"
)

Breaking Changes

  • Make Milvus2DocumentStore compatible with pymilvus>=2.0.0 by @MichelBartels in #2126
  • Set provider parameter when instantiating onnxruntime.InferenceSession and make device a torch.device in internal methods by @cjb06776 in #1976

Pipeline

Models

  • Update LFQA with the latest LFQA seq2seq and retriever models by @vblagoje in #2210

DocumentStores

Documentation

Tutorials

Other Changes

New Contributors

❤️ Big thanks to all contributors and the whole community!

Don't miss a new haystack release

NewReleases is sending notifications on new releases.