github deepset-ai/haystack v1.10.0

latest releases: v2.5.1, v2.5.1-rc2, v2.5.1-rc1...
23 months ago

⭐ Highlights

Expanding Haystack's LLM support with the new OpenAIEmbeddingEncoder (#3356)

Now you can easily create document and query embeddings using large language models: if you have an OpenAI account, all you have to do is set the name of one of the supported models (ada, babbage, davinci or curie) and add your API key to the EmbeddingRetriever component in your pipelines (see docs).

Multimodal retrieval is here! (#2891)

Multimodality with Haystack just made a big leap forward with the addition of MultiModalRetriever: a Retriever that can handle different modalities for query and documents independently. Take it for a spin and experiment with new Document formats, like images. You can now use the same Retriever for text-to-image, text-to-table, and text-to-text retrieval but also image similarity, table similarity, and more! Feed your favorite multimodal model to MultiModalRetriever and see it in action.

retriever = MultiModalRetriever(
    document_store=InMemoryDocumentStore(embedding_dim=512),
    query_embedding_model = "sentence-transformers/clip-ViT-B-32",
    query_type="text",
    document_embedding_models = {"image": "sentence-transformers/clip-ViT-B-32"}
)

Multi-platform Docker images

Starting with 1.10, we're making the deepset/haystack images available for linux/amd64 and linux/arm64.

⚠️ Breaking change in embed_queries method (#3252)

We've changed the text argument in the embed_queries method for DensePassageRetriever and EmbeddingRetriever to queries.

What's Changed

Breaking Changes

Pipeline

  • fix: ONNX FARMReader model conversion is broken by @vblagoje in #3211
  • bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node by @JeffRisberg in #3170
  • fix: eval() with add_isolated_node_eval=True breaks if no node supports it by @tstadel in #3347
  • feat: extract label aggregation by @tstadel in #3363
  • feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3356
  • fix: stable YAML schema generation by @ZanSara in #3388
  • fix: Update how schema is ordered by @sjrl in #3399
  • feat: MultiModalRetriever by @ZanSara in #2891

DocumentStores

  • feat: FAISS in OpenSearch: Support HNSW for cosine by @tstadel in #3217
  • feat: add support for Elasticsearch 7.16.2 by @masci in #3318
  • refactor: remove dead code from FAISSDocumentStore by @anakin87 in #3372
  • fix: allow same vector_id in different indexes for SQL-based Document stores by @anakin87 in #3383

UI / Demo

  • fix: demo won't start through Docker compose on Apple M1 by @masci in #3337

Documentation

Other Changes

  • refactor: make TransformersDocumentClassifier output consistent between different types of classification by @anakin87 in #3224
  • Classify pipeline's type based on its components by @vblagoje in #3132
  • docs: sync Haystack API with Readme by @brandenchan in #3223
  • fix: MostSimilarDocumentsPipeline doesn't have pipeline property by @vblagoje in #3265
  • bug: make ElasticSearchDocumentStore use batch_size in get_documents_by_id by @anakin87 in #3166
  • refactor: better tests for TransformersDocumentClassifier by @anakin87 in #3270
  • fix: AttributeError in TranslationWrapperPipeline by @nickchomey in #3290
  • refactor: remove Inferencer multiprocessing by @vblagoje in #3283
  • fix: opensearch script score with filters by @tstadel in #3321
  • feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch by @JacdDev in #3301
  • feat: add multi-platform Docker images by @masci in #3354
  • fix: Added checks for DataParallel and WrappedDataParallel by @sjrl in #3366
  • fix: QuestionGenerator generates wrong document questions for non-default num_queries_per_doc parameter by @vblagoje in #3381
  • bug: Adds better way of checking query in BaseRetriever and Pipeline.run() by @ugm2 in #3304
  • feat: Updated EntityExtractor to handle long texts and added better postprocessing by @sjrl in #3154
  • docs: Add comment about the generation of no-answer samples in FARMReader training by @brandenchan in #3404
  • feat: Speed up integration tests (nodes) by @sjrl in #3408
  • fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
  • bug: change type of split_by to Literal including None by @julian-risch in #3389
  • feat: Add exponential backoff decorator; apply it to OpenAI requests by @vblagoje in #3398

New Contributors

Full Changelog: v1.9.1...v1.10.0rc1

Don't miss a new haystack release

NewReleases is sending notifications on new releases.