github deepset-ai/haystack v1.11.0

latest releases: v1.26.4, v2.7.0, v2.7.0-rc1...
24 months ago

⭐ Highlights

Expanding Haystack’s LLM support further with the new CohereEmbeddingEncoder (#3356)

Now you can easily create document and query embeddings using Cohere’s large language models: if you have a Cohere account, all you have to do is set the name of one of the supported models (small, medium, or large) and add your API key to the EmbeddingRetriever component in your pipelines (see docs).

Extracting headlines from Markdown and PDF files (#3445 #3488)

Using the MarkdownConverter or the ParsrConverter you can set the parameter extract_headlines to True to extract the headlines out of your files together with their start start position in the file and their level. Headlines are stored as a list of dictionaries in the Document's meta field "headlines" and are structured as followed:

{
    "headline": <THE HEADLINE STRING>,
    "start_idx": <IDX OF HEADLINE START IN document.content >,
    "level": <LEVEL OF THE HEADLINE>
}

Introducing the proposals design process (#3333)

We've introduced the proposal design process for substantial changes. A proposal is a single Markdown file that explains why a change is needed and how it would be implemented. You can find a detailed explanation of the process and a proposal template in the proposals directory.

⚠️ Breaking change: removing Milvus1DocumentStore

From this version onwards, Haystack no longer supports version 1 of Milvus. We still support Milvus version 2. We removed Milvus1DocumentStore and renamed Milvus2DocumentStore to MilvusDocumentStore.

What's Changed

Breaking Changes

  • bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings workflow by @mayankjobanputra in #3368
  • BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x by @masci in #3552

Pipeline

  • fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
  • bug: change type of split_by to Literal including None by @julian-risch in #3389
  • Fix: update pyworld pin by @anakin87 in #3435
  • feat: send event if number of queries exceeds threshold by @vblagoje in #3419
  • Feat: allow decreasing size of datasets loaded from BEIR by @ugm2 in #3392
  • feat: add __cointains__ to Span by @ZanSara in #3446
  • Bug: Fix prompt length computation by @Timoeller in #3448
  • Add indexing pipeline type by @vblagoje in #3461
  • fix: warning if doc store similarity function is incompatible with Sentence Transformers model by @anakin87 in #3455
  • feat: Add CohereEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3453
  • feat: Extraction of headlines in markdown files by @bogdankostic in #3445
  • bug: replace decorator with counter attribute for pipeline event by @julian-risch in #3462
  • feat: add document_store to all BaseRetriever.retrieve() and BaseRetriever.retrieve_batch() implementations by @ZanSara in #3379
  • refactor: TableReader by @sjrl in #3456
  • fix: do not reference package directory in PDFToTextOCRConverter.convert() by @ZanSara in #3478
  • feat: Create the TextIndexingPipeline by @brandenchan in #3473
  • refactor: remove YAML save/load methods for subclasses of BaseStandardPipeline by @ZanSara in #3443
  • fix: strip whitespaces safely from FARMReader's answers by @ZanSara in #3526

DocumentStores

  • Document Store test refactoring by @masci in #3449
  • fix: support long texts for labels in ElasticsearchDocumentStore by @anakin87 in #3346
  • feat: add SQLDocumentStore tests by @masci in #3517
  • refactor: Refactor Weaviate tests by @masci in #3541
  • refactor: Pinecone tests by @masci in #3555
  • fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" by @anakin87 in #3548
  • fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta by @tstadel in #3572
  • fix: discard metadata fields if not set in Weaviate by @masci in #3578

UI / Demo

Documentation

Other Changes

New Contributors

Full Changelog: v1.10.0...v1.11.0rc1

Don't miss a new haystack release

NewReleases is sending notifications on new releases.