github deepset-ai/haystack v1.15.0

latest releases: v2.5.1, v2.5.1-rc2, v2.5.1-rc1...
18 months ago

⭐ Highlights

Build Agents Yourself with Open Source

Exciting news! Say hello to LLM-based Agents, the new decision makers for your NLP applications! These agents have the power to answer complex questions by creating a dynamic action plan and using a variety of Tools in a loop. Picture this: your Agent decides to tackle a multi-hop question by retrieving pieces of information through a web search engine again and again. That's just one of the many feats these Agents can accomplish. Excited about the recent ChatGPT plugins? Agents allow you to build similar experiences in an open source way: your own environment, full control and transparency.
But how do you get started? First, wrap your Haystack Pipeline in a Tool and give your Agent a description of what that Tool can do. Then, initialize your Agent with a list of Tools and a PromptNode that decides when to use each Tool.

web_qa_tool = Tool(
    name="Search",
    pipeline_or_node=WebQAPipeline(retriever=web_retriever, prompt_node=web_qa_pn),
    description="useful for when you need to Google questions.",
    output_variable="results",
)

agent = Agent(
    prompt_node=agent_pn,
    prompt_template=prompt_template,
    tools=[web_qa_tool],
    final_answer_pattern=r"Final Answer\s*:\s*(.*)",
)
agent.run(query="<Your question here!>")

Check out the full example, a stand-alone WebQAPipeline, our new tutorials and the documentation!

Flexible PromptTemplates

Get ready to take your Pipelines to the next level with the revamped PromptNode. Now you have more flexibility when it comes to shaping the PromptNode outputs and inputs to work seamlessly with other nodes. But wait, there's more! You can now apply functions right within prompt_text. Want to concatenate the content of input documents? No problem! It's all possible with the PromptNode. And that's not all! The output_parser converts output into Haystack Document, Answer, or Label formats. Check out the AnswerParser in action, fully loaded and ready to use:

PromptTemplate(
            name="question-answering",
            prompt_text="Given the context please answer the question.\n" 
            			"Context: {join(documents)}\n"
            			"Question: {query}\n"
            			"Answer: ",
            output_parser=AnswerParser(),
        )

More details here.

Using ChatGPT through PromptModel

A few lines of code are all you need to start chatting with ChatGPT through Haystack! The simple message format distinguishes instructions, user questions, and assistant responses. And with the chat functionality you can ask follow-up questions as in this example:

prompt_model = PromptModel("gpt-3.5-turbo", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Haystack Extras

We now have another repo haystack-extras with extra Haystack components, like audio nodes AnswerToSpeech and DocumentToSpeech. For example, these two can be installed via:

pip install farm-haystack-text2speech

What's Changed

Breaking Changes

  • feat!: Increase Crawler standardization regarding Pipelines by @danielbichuetti in #4122
  • feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation by @danielbichuetti in #4226
  • build: Use uvicorn instead of gunicorn as server in REST API's Dockerfile by @bogdankostic in #4304
  • chore!: remove deprecated OpenDistroElasticsearchDocumentStore by @masci in #4361
  • refactor: Remove AnswerToSpeech and DocumentToSpeech nodes by @silvanocerza in #4391
  • fix: Fix debug on PromptNode by @recrudesce in #4483
  • feat: PromptTemplate extensions by @tstadel in #4378

Pipeline

  • feat: Add JsonConverter node by @bglearning in #4130
  • fix: Shaper store all outputs from function by @sjrl in #4223
  • refactor: Isolate PDF OCR converter from PDF text converter by @danielbichuetti in #4193
  • fix: add option to not override results by Shaper by @tstadel in #4231
  • feat: reduce and focus telemetry by @ZanSara in #4087
  • refactor: Remove deprecated nodes EvalDocuments and EvalAnswers by @anakin87 in #4194
  • refact: mark unit tests under the test/nodes/** path by @masci in #4235
  • fix: FARMReader produces Answers with negative start and end position by @julian-risch in #4248
  • test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS by @anakin87 in #4283
  • test: mock all Translator tests and move one to e2e by @ZanSara in #4290
  • fix: Prevent going past token limit in OpenAI calls in PromptNode by @sjrl in #4179
  • feat: Add Azure OpenAI embeddings support by @danielbichuetti in #4332
  • test: move tests on standard pipelines in e2e/ by @ZanSara in #4309
  • fix: EvalResult load migration by @tstadel in #4289
  • feat: Report execution time for pipeline components in _debug by @zoltan-fedor in #4197
  • refactor: Use TableQuestionAnsweringPipeline from transformers by @sjrl in #4303
  • fix: hf-tiny-roberta model loading from disk and mypy errors by @mayankjobanputra in #4363
  • docs: TransformersImageToText- inform about supported models, better exception handling by @anakin87 in #4310
  • fix: check that answer is not None before accessing it in table.py by @culms in #4376
  • feat: add automatic OCR detection mechanism and improve performance by @danielbichuetti in #4329
  • Add Whisper node by @vblagoje in #4335
  • tests: Mark Crawler tests correctly by @silvanocerza in #4435
  • test: Skip flaky test_multimodal_retriever_query by @silvanocerza in #4444
  • fix: issue evaluation check for content type by @ju-gu in #4181
  • feat: break retry loop for 401 unauthorized errors in promptnode by @FHardow in #4389
  • refactor: Remove retry_with_exponential_backoff in favor of tenacity by @silvanocerza in #4460
  • refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever by @silvanocerza in #4499
  • refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever by @silvanocerza in #4500
  • refactor: remove telemetry v1 by @ZanSara in #4496
  • feat: expose prompts to Answer and EvaluationResult by @tstadel in #4341
  • feat: Add agent tools by @vblagoje in #4437
  • refactor: reduce telemetry events count by @ZanSara in #4501

DocumentStores

  • fix: OpenSearchDocumentStore.delete_index doesn't raise by @tstadel in #4295
  • fix: increase MetaDocumentORM value length in SQLDocumentStore by @anakin87 in #4333
  • fix: when using IVF* indexing, ensure the index is trained frist by @kaixuanliu in #4311
  • refactor: Mark MilvusDocumentStore as deprecated by @silvanocerza in #4498

Documentation

Other Changes

New Contributors

Full Changelog: v1.14.0...v1.15.0

Don't miss a new haystack release

NewReleases is sending notifications on new releases.