github deepset-ai/haystack v1.16.0

latest releases: v2.6.0, v2.6.0-rc3, v2.6.0-rc2...
17 months ago

⭐️ Highlights

Using GPT-4 through PromptNode and Agent

Haystack now supports GPT-4 through PromptNode and Agent. This means you can use the latest advancements in large language modeling to make your NLP applications more accurate and efficient.

To get started, create a PromptModel for GPT-4 and plug it into your PromptNode. Just like with ChatGPT, you can use GPT-4 in a chat scenario and ask follow-up questions, as shown in this example:

prompt_model = PromptModel("gpt-4", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

More flexible routing of Documents with RouteDocuments

This release includes an enhancement to the RouteDocuments node, which makes Document routing even more flexible.

The RouteDocuments node now not only returns Documents matched by the split_by or metadata_values parameter, but also creates an extra route for unmatched Documents. This means that you won't accidentally filter out any Documents due to missing metadata fields. Additionally, the update adds support for using List[List[str]] as input type to metadata_values, so multiple metadata values can be grouped into a single output.

Deprecating RAGenerator and Seq2SeqGenerator

RAGenerator and Seq2SeqGenerator are deprecated and will be removed in version 1.18. We advise using the more powerful PromptNode instead, which can use RAG and Seq2Seq models as well. The following example shows how to use PromptNode as a replacement for Seq2SeqGenerator:

p = PromptNode("vblagoje/bart_lfqa")

# Start by defining a question/query
query = "Why does water heated to room temperature feel colder than the air around it?"

# Given the question above, suppose the documents below were found in some document store
documents = [
    "when the skin is completely wet. The body continuously loses water by...",
    "at greater pressures. There is an ambiguity, however, as to the meaning of the terms 'heating' and 'cooling'...",
    "are not in a relation of thermal equilibrium, heat will flow from the hotter to the colder, by whatever pathway...",
    "air condition and moving along a line of constant enthalpy toward a state of higher humidity. A simple example ...",
    "Thermal contact conductance. In physics, thermal contact conductance is the study of heat conduction between solid ...",
]


# Manually concatenate the question and support documents into BART input
# conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
# query_and_docs = "question: {} context: {}".format(query, conditioned_doc)

# Or use the PromptTemplate as shown here
pt = PromptTemplate("lfqa", "question: {query} context: {join(documents, delimiter='<P>')}")

res = p.prompt(prompt_template=pt, query=query, documents=[Document(d) for d in documents])

⚠️ Breaking Changes

Refactoring of our dependency management

We added the following extras as optional dependencies for Haystack: stats, metrics, preprocessing, file-conversion, and elasticsearch. To keep using certain components, you need to install farm-haystack with these new extras:

Component Installation extra
PreProcessor farm-haystack[preprocessing]
DocxToTextConverter farm-haystack[file-conversion]
TikaConverter farm-haystack[file-conversion]
LangdetectDocumentLanguageClassifier farm-haystack[file-conversion]
ElasticsearchDocumentStore farm-haystack[elasticsearch]

Dropping support for Python 3.7

Since Python 3.7 will reach end of life in June 2023, we will no longer support it as of Haystack version 1.16.

Smaller Breaking Changes

  • Using TableCell instead of Span to indicate the coordinates of a table cell (#4616)
  • Default save_dir for FARMReader's train method changed to f"./saved_models/{self.inferencer.model.language_model.name}" (#4553)
  • Using PreProcessor with split_respect_sentence_boundary set to True might return a different set of Documents than in v1.15 (#4470)

What's Changed

Breaking Changes

  • feat: Deduplicate duplicate Answers resulting from overlapping Documents in FARMReader by @bogdankostic in #4470
  • feat: Change default save_dir for FARMReader.train by @GitIgnoreMaybe in #4553
  • feat!: drop Python3.7 support by @ZanSara in #4421
  • refactor!: extract evaluation and statistical dependencies by @ZanSara in #4457
  • refactor!: extract preprocessing and file conversion deps by @ZanSara in #4605
  • feat: Implementation of Table Cell Proposal by @sjrl in #4616

Pipeline

DocumentStores

  • fix: Check for date fields in weaviate meta update by @joekitsmith in #4371
  • chore: skip Milvus tests by @ZanSara in #4654
  • docs: Add deprecation information to doc string of MilvusDocumentStore by @bogdankostic in #4658
  • Ignore cross-reference properties when loading documents by @masci in #4664
  • fix: PineconeDocumentStore error when delete_documents right after initialization by @Namoush in #4609
  • fix: remove warnings from the more recent Elasticsearch client by @masci in #4602
  • fix: Fixing the Weaviate BM25 query builder bug by @zoltan-fedor in #4703

Documentation

Other Changes

New Contributors

Full Changelog: v1.15.1...v1.16.0

Don't miss a new haystack release

NewReleases is sending notifications on new releases.