⭐️ Highlights

🔎 Elasticsearch 8 support

We are thrilled to share that Haystack now supports the latest version of Elasticsearch, Elasticsearch 8, as Document Store backend. To use Haystack with Elasticsearch 8, just install the new elasticsearch8 extra:

pip install farm-haystack[elasticsearch8]

Importing ElasticsearchDocumentStore from haystack.document_stores will automatically choose the correct Document Store based on the version of the installed Elasticsearch client.

🗂️ RecentnessRanker

We're excited to introduce a new feature to Haystack – a document recentness ranking component! We recognized the importance of ranking documents based on their recentness, especially in scenarios where timely information is critical. For instance, when searching through technical documentation for software releases or news articles, it's essential to prioritize the most up-to-date information. 👇

from haystack.nodes import RecentnessRanker

ranker = RecentnessRanker(
    date_meta_field="date",  # Key pointing to the date field in the metadata.
    ranking_mode="score",
    weight=0.5,  # A 0.5 weight means content relevance and age are averaged.
)

For more details, check out the documentation.

🧠 Improved support for Anthropic Claude

We're thrilled to announce an important update to Haystack's Anthropic Claude support! This update follows the latest improvements in Anthropic Claude models, notably support for Claude 2 and their humongous context window sizes.

Moreover, we've integrated Claude models into our example scripts, making it easier for users to test these cutting-edge models. For instance, check out the updated examples/link_content_blog_post_summary.py script for a demo of Claude summarizing blog posts directly from hyperlinks.

We still support the old models (i.e., claude-v1) and the new Claude models. For more details, see the Anthropic Claude documentation.

🚀 Support for Llama 2 on AWS SageMaker

We are excited to share that Haystack now supports models of the Llama 2 family deployed to AWS SageMaker! Once you’ve deployed your Llama 2 models (including the chat variant) in AWS SageMaker, use them with PromptNode by simply providing the inference endpoint name, your aws_profile_name and aws_custom_attributes👇

from haystack.nodes import PromptNode

prompt_node = PromptNode(
    model_name_or_path="sagemaker-llama-2-endpoint-name", 
    model_kwargs={"aws_profile_name": "my_aws_profile_name", 
                                      "aws_custom_attributes":{"accept_eula": True}}
)
result = prompt_node("Berlin is the capital of")
print(result)

# or the Llama 2 chat model
prompt_node = PromptNode(
    model_name_or_path="sagemaker-llama-2-chat-endpoint-name", 
    model_kwargs={"aws_profile_name": "my_aws_profile_name", 
                                      "aws_custom_attributes":{"accept_eula": True}}
)
chat_conversation = [[
    {"role": "user", "content": "what is the recipe of mayonnaise?"},
]]
result = prompt_node(chat_conversation)
print(result)

For more details on model deployment, check out the documentation.

🎉 Now using transformers 4.31.0

With this release, Haystack depends on the latest version of the transformers library, allowing support for Llama 2.

🚫 SklearnQueryClassifier deprecation

Starting from version 1.19, SklearnQueryClassifier is being deprecated and will be removed from Haystack as of version 1.21. We recommend using the more powerful TransformersQueryClassifier instead. See the announcement for more details.

What's Changed

Pipeline

feat: globally disable progress bars by @ZanSara in #5207
Add cpu-remote-inference Docker image by @vblagoje in #5225
fix: Support isolated node eval in run_batch in Generators by @bogdankostic in #5291
feat: support OpenAI-Organization for authentication by @anakin87 in #5292
docs: Small documentation updates to dense.py by @sjrl in #5305
test: Refactor some retriever tests into unit tests by @sjrl in #5306
feat: Add support for meta fields that are lists when using embed_meta_fields by @sjrl in #5307
refactor: Extract link retrieval from WebRetriever, introduce LinkContentFetcher by @vblagoje in #5227
fix: update WebRetriever docstrings and default mode by @dfokina in #5352
added hybrid search example by @nickprock in #5376

DocumentStores

fix: Allow filtering on list fields in InMemoryDocumentStore with all operators by @bogdankostic in #5208
Fix: FAISSDocumentStore - make write_documents properly work in combination w update_embeddings by @anakin87 in #5221
bug: fix for pinecone not working for per document updates by @vblagoje in #5110
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests by @tstadel in #5113
ci: Add unit test for Elasticsearch8 by @bogdankostic in #5300
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 by @bogdankostic in #5320

Documentation

feat: BM25 retrieval for MemoryDocumentStore by @vblagoje in #5151
fix: install inference in REST API tests by @ZanSara in #5252
fix: import_utils fetch_archive_from_http - improve url parsing for fetching archive from http by @malte-aws in #5199
fix: Improve robustness of get_task HF pipeline invocations by @MichelBartels in #5284
feat: introduce Store protocol (v2) by @ZanSara in #5259
fix: num_return_sequences should be less than num_beams, not top_k by @faaany in #5280
Revert "fix: num_return_sequences should be less than num_beams, not top_k" by @julian-risch in #5434
chore: deprecate SklearnQueryClassifier by @anakin87 in #5324
fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed by @MichelBartels in #5308
fix: a small bug in StopWordsCriteria by @faaany in #5316
chore: fix typo in base.py by @eltociear in #5356
feat: extend pipeline.add_component to support stores by @ZanSara in #5261
proposal: Add RecentnessRanker component by @elundaeva in #5289
feat: Add embed_meta_fields to Ranker nodes by @sjrl in #5361
feat: Recentness Ranker by @elundaeva in #5301
feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes by @vblagoje in #5406
feat: Enable Support for Meta LLama-2 Models in Amazon Sagemaker by @vblagoje in #5437

Other Changes

fix: MultiLabel to_json works with Table Labels by @sjrl in #5257
chore: Remove deprecated return_table_cell from conftest.py by @sjrl in #5264
test: Update test/others/test_utils.py by @sjrl in #5270
test: Adapt batch size in retriever-reader benchmarks by @bogdankostic in #5281
fix: Add dependecies to build lxml successfully in base Docker image by @vblagoje in #5288
Remove requests_cache in tests by @silvanocerza in #5285
refactor: Simplify selection of Azure vs OpenAI invocation layers by @vblagoje in #5271
feat: batch mode for MemoryRetriever (v2) by @ZanSara in #5287
chore: Add support for hierarchical docs by @silvanocerza in #5278
build: Add elasticsearch7 and elasticsearch8 extra by @bogdankostic in #5296
chore: Adapt import message for Elasticsearch7 by @bogdankostic in #5295
ci: Add job for ES8 integration tests by @bogdankostic in #5297
ci: Update labeler.yml to account for Elasticsearch changes by @bogdankostic in #5318
create invocation-layers API reference page by @dfokina in #5262
chore: pin scikit-learn>=1.3.0 by @anakin87 in #5322
feat: upgrade canals in preview by @ZanSara in #5344
tests: Improve token limit tests for OpenAI PromptNode layer by @vblagoje in #5351
style: max_new_tokens are set twice with the same value by @faaany in #5368
test: Re-activate end-to-end tests workflow by @julian-risch in #5343
docs: Pin PyYAML to 5.3.1 by @bogdankostic in #5400
build: upgrade transformers to v4.31.0 by @anakin87 in #5391
fix: Error message about weight param in RecentnessRanker by @julian-risch in #5409
docs: Add Elasticsearch to API config by @bogdankostic in #5422
fix: Improve log warnings in REST API /health endpoint by @vblagoje in #5381
build: Unpin mlflow, constraint dulwich and botocore by @silvanocerza in #5441

New Contributors

@malte-aws made their first contribution in #5199
@eltociear made their first contribution in #5356
@elundaeva made their first contribution in #5289
@nickprock made their first contribution in #5376

Full Changelog: v1.18.1...v1.19.0

deepset-ai/haystack v1.19.0 on GitHub