⭐ Highlights

Build Agents Yourself with Open Source

Exciting news! Say hello to LLM-based Agents, the new decision makers for your NLP applications! These agents have the power to answer complex questions by creating a dynamic action plan and using a variety of Tools in a loop. Picture this: your Agent decides to tackle a multi-hop question by retrieving pieces of information through a web search engine again and again. That's just one of the many feats these Agents can accomplish. Excited about the recent ChatGPT plugins? Agents allow you to build similar experiences in an open source way: your own environment, full control and transparency.
But how do you get started? First, wrap your Haystack Pipeline in a Tool and give your Agent a description of what that Tool can do. Then, initialize your Agent with a list of Tools and a PromptNode that decides when to use each Tool.

web_qa_tool = Tool(
    name="Search",
    pipeline_or_node=WebQAPipeline(retriever=web_retriever, prompt_node=web_qa_pn),
    description="useful for when you need to Google questions.",
    output_variable="results",
)

agent = Agent(
    prompt_node=agent_pn,
    prompt_template=prompt_template,
    tools=[web_qa_tool],
    final_answer_pattern=r"Final Answer\s*:\s*(.*)",
)
agent.run(query="<Your question here!>")

Check out the full example, a stand-alone WebQAPipeline, our new tutorials and the documentation!

Flexible PromptTemplates

Get ready to take your Pipelines to the next level with the revamped PromptNode. Now you have more flexibility when it comes to shaping the PromptNode outputs and inputs to work seamlessly with other nodes. But wait, there's more! You can now apply functions right within prompt_text. Want to concatenate the content of input documents? No problem! It's all possible with the PromptNode. And that's not all! The output_parser converts output into Haystack Document, Answer, or Label formats. Check out the AnswerParser in action, fully loaded and ready to use:

PromptTemplate(
            name="question-answering",
            prompt_text="Given the context please answer the question.\n" 
            			"Context: {join(documents)}\n"
            			"Question: {query}\n"
            			"Answer: ",
            output_parser=AnswerParser(),
        )

More details here.

Using ChatGPT through PromptModel

A few lines of code are all you need to start chatting with ChatGPT through Haystack! The simple message format distinguishes instructions, user questions, and assistant responses. And with the chat functionality you can ask follow-up questions as in this example:

prompt_model = PromptModel("gpt-3.5-turbo", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Haystack Extras

We now have another repo haystack-extras with extra Haystack components, like audio nodes AnswerToSpeech and DocumentToSpeech. For example, these two can be installed via:

pip install farm-haystack-text2speech

What's Changed

Breaking Changes

feat!: Increase Crawler standardization regarding Pipelines by @danielbichuetti in #4122
feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation by @danielbichuetti in #4226
build: Use uvicorn instead of gunicorn as server in REST API's Dockerfile by @bogdankostic in #4304
chore!: remove deprecated OpenDistroElasticsearchDocumentStore by @masci in #4361
refactor: Remove AnswerToSpeech and DocumentToSpeech nodes by @silvanocerza in #4391
fix: Fix debug on PromptNode by @recrudesce in #4483
feat: PromptTemplate extensions by @tstadel in #4378

Pipeline

feat: Add JsonConverter node by @bglearning in #4130
fix: Shaper store all outputs from function by @sjrl in #4223
refactor: Isolate PDF OCR converter from PDF text converter by @danielbichuetti in #4193
fix: add option to not override results by Shaper by @tstadel in #4231
feat: reduce and focus telemetry by @ZanSara in #4087
refactor: Remove deprecated nodes EvalDocuments and EvalAnswers by @anakin87 in #4194
refact: mark unit tests under the test/nodes/** path by @masci in #4235
fix: FARMReader produces Answers with negative start and end position by @julian-risch in #4248
test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS by @anakin87 in #4283
test: mock all Translator tests and move one to e2e by @ZanSara in #4290
fix: Prevent going past token limit in OpenAI calls in PromptNode by @sjrl in #4179
feat: Add Azure OpenAI embeddings support by @danielbichuetti in #4332
test: move tests on standard pipelines in e2e/ by @ZanSara in #4309
fix: EvalResult load migration by @tstadel in #4289
feat: Report execution time for pipeline components in _debug by @zoltan-fedor in #4197
refactor: Use TableQuestionAnsweringPipeline from transformers by @sjrl in #4303
fix: hf-tiny-roberta model loading from disk and mypy errors by @mayankjobanputra in #4363
docs: TransformersImageToText- inform about supported models, better exception handling by @anakin87 in #4310
fix: check that answer is not None before accessing it in table.py by @culms in #4376
feat: add automatic OCR detection mechanism and improve performance by @danielbichuetti in #4329
Add Whisper node by @vblagoje in #4335
tests: Mark Crawler tests correctly by @silvanocerza in #4435
test: Skip flaky test_multimodal_retriever_query by @silvanocerza in #4444
fix: issue evaluation check for content type by @ju-gu in #4181
feat: break retry loop for 401 unauthorized errors in promptnode by @FHardow in #4389
refactor: Remove retry_with_exponential_backoff in favor of tenacity by @silvanocerza in #4460
refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever by @silvanocerza in #4499
refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever by @silvanocerza in #4500
refactor: remove telemetry v1 by @ZanSara in #4496
feat: expose prompts to Answer and EvaluationResult by @tstadel in #4341
feat: Add agent tools by @vblagoje in #4437
refactor: reduce telemetry events count by @ZanSara in #4501

DocumentStores

fix: OpenSearchDocumentStore.delete_index doesn't raise by @tstadel in #4295
fix: increase MetaDocumentORM value length in SQLDocumentStore by @anakin87 in #4333
fix: when using IVF* indexing, ensure the index is trained frist by @kaixuanliu in #4311
refactor: Mark MilvusDocumentStore as deprecated by @silvanocerza in #4498

Documentation

feat: add top_k to PromptNode by @tstadel in #4159
feat: Add Agent by @julian-risch in #4148
ci: Automate OpenAPI specs upload to Readme.io by @silvanocerza in #4228
ci: Refactor docs config and generation by @silvanocerza in #4280
feat: Add Azure as OpenAI endpoint by @vblagoje in #4170
refactor: Allow flexible document id generation by @danielbichuetti in #4326

Other Changes

ci: Move xpdf build into separate container by @silvanocerza in #4199
refactor: Remove id_hash_keys parameter in from_dict method by @bogdankostic in #4207
bug: Check cuda availability before calling by @abwiersma in #4174
ci: Fix Dockerfile.base failing cause of missing git by @silvanocerza in #4210
fix: allowing file-upload api to write files to disk by @mayankjobanputra in #4221
fix: Fix bug in prompt template check of OpenAIAnswerGenerator by @sjrl in #4220
ci: Fix Dockerfile.base failing cause of missing dependencies by @silvanocerza in #4215
fix: Better error messages for OCR requirement (#3767) by @in-balamurugan in #3900
Docs: Update top_k description for PromptNode by @agnieszka-m in #4224
bug: fix typo in google.colab package detection by @ZanSara in #4238
proposal: Implement Agent demo by @vblagoje in #4085
ci: Remove unnecessary operations in minor_version_release.yml by @silvanocerza in #4267
Fix: Issue of failure to initialize input_converter in Seq2SeqGenerator when model_file_path is given as folder path on local disk after manual model download by @Kshitijpawar in #4213
test: Fix deprecation fixture by @silvanocerza in #4219
Fix: Allow torch_dtype="auto" in PromptNode by @sjrl in #4166
ci: Parallellize Docker build job by @silvanocerza in #4268
test: Added integration test for using EntityExtractor in query pipeline by @sjrl in #4117
refactor: Make extraction of "Tool" and "Tool input" for Agent more robust and user-friendly by @tholor in #4269
ci: Change docker_release.yml workflow to run after successful PyPi release by @silvanocerza in #4293
docs: Fix search path for Shaper API docs by @bogdankostic in #4306
test: mock all Summarizer tests and move a few into e2e by @ZanSara in #4299
ci: Fix docstring-labeler.yml workflow by @silvanocerza in #4307
build: Remove xpdf dependencies by @bogdankostic in #4314
test: Pin requests-cache test dependency to <1.0.0 by @silvanocerza in #4325
chore: Add Intelijus as using Haystack by @danielbichuetti in #4330
refactor: Separate PromptModelInvocationLayers in providers.py by @vblagoje in #4327
ci: Add workflow to push CI metrics to Datadog by @silvanocerza in #4336
Update README.md by @TuanaCelik in #4340
proposal: Shapers in Prompt Templates by @tstadel in #4172
refactor: simplify registration of PromptModelInvocationLayer by @ZanSara in #4339
fix: Fix print_answers for output of query run_batch by @vbernardes in #4273
refactor: Simplify agent and tool interaction by @vblagoje in #4362
proposal: drop BaseComponent and re-implement Pipeline by @ZanSara in #4284
feat: LanguageClassifier by @ZanSara in #2994
docs: add DocumentLanguageClassifier API by @anakin87 in #4401
chore: make the docs generator runnable without an API key by @masci in #4405
feat: new Pipeline by @ZanSara in #4368
ci: Use bigger runner for integration-tests-linux by @silvanocerza in #4422
test: Fix audio tests failing by @silvanocerza in #4418
feat: improve is_containerized() by @masci in #4412
Docs: Update Agent docstrings + add api docs by @agnieszka-m in #4296
refactor: rename v2 package to preview by @ZanSara in #4409
feat: add PromptNode OpenAI token streaming by @vblagoje in #4397
feat: Isolate integration PromptNode tests into a separate test unit by @vblagoje in #4420
Docs: Fix order and category of agent by @agnieszka-m in #4440
test: Remove unnecessary imports in conftest.py by @silvanocerza in #4434
Docs: Fix agent module by @agnieszka-m in #4441
test: stop running the CI on macos by @masci in #4443
ci: Run readme_sync.yml in PRs by @silvanocerza in #4442
feat: Add the New Tokenizer of gpt-3.5-turbo by @AI-Ahmed in #4331
Docs: Update language classifier docstrings by @agnieszka-m in #4413
ci: remove python_cache internal action by @silvanocerza in #4429
feat: Add ChatGPT PromptNode layer by @vblagoje in #4357
chore: Make version semver compliant by @silvanocerza in #4456
refactor: Add AgentStep by @vblagoje in #4431
feat: Enable PromptNode to use text-generation models by @vblagoje in #4349
feat: add additional params to file upload endpoint by @josepablofm78 in #4445
fix: stop loading FAISS and InMem doc Store for indexing pipelines by @mayankjobanputra in #4396
feat:Add agent event callbacks by @vblagoje in #4491
bug: Exclude rdflib 6.3.2 because of license issues by @julian-risch in #4495
ci: remove telemetry env var by @ZanSara in #4497
feat: prompt at query time by @tstadel in #4454
chore: wire up AnswerParser by @tstadel in #4505
Fix pipeline config and agent tools hashing for telemetry by @silvanocerza #4508

New Contributors

@abwiersma made their first contribution in #4174
@in-balamurugan made their first contribution in #3900
@Kshitijpawar made their first contribution in #4213
@vbernardes made their first contribution in #4273
@culms made their first contribution in #4376
@kaixuanliu made their first contribution in #4311
@josepablofm78 made their first contribution in #4445
@recrudesce made their first contribution in #4483

Full Changelog: v1.14.0...v1.15.0

deepset-ai/haystack v1.15.0 on GitHub