⭐️ Highlights
🔎 Elasticsearch 8 support
We are thrilled to share that Haystack now supports the latest version of Elasticsearch, Elasticsearch 8, as Document Store backend. To use Haystack with Elasticsearch 8, just install the new elasticsearch8
extra:
pip install farm-haystack[elasticsearch8]
Importing ElasticsearchDocumentStore
from haystack.document_stores
will automatically choose the correct Document Store based on the version of the installed Elasticsearch client.
🗂️ RecentnessRanker
We're excited to introduce a new feature to Haystack – a document recentness ranking component! We recognized the importance of ranking documents based on their recentness, especially in scenarios where timely information is critical. For instance, when searching through technical documentation for software releases or news articles, it's essential to prioritize the most up-to-date information. 👇
from haystack.nodes import RecentnessRanker
ranker = RecentnessRanker(
date_meta_field="date", # Key pointing to the date field in the metadata.
ranking_mode="score",
weight=0.5, # A 0.5 weight means content relevance and age are averaged.
)
For more details, check out the documentation.
🧠 Improved support for Anthropic Claude
We're thrilled to announce an important update to Haystack's Anthropic Claude support! This update follows the latest improvements in Anthropic Claude models, notably support for Claude 2 and their humongous context window sizes.
Moreover, we've integrated Claude models into our example scripts, making it easier for users to test these cutting-edge models. For instance, check out the updated examples/link_content_blog_post_summary.py
script for a demo of Claude summarizing blog posts directly from hyperlinks.
We still support the old models (i.e., claude-v1) and the new Claude models. For more details, see the Anthropic Claude documentation.
🚀 Support for Llama 2 on AWS SageMaker
We are excited to share that Haystack now supports models of the Llama 2 family deployed to AWS SageMaker! Once you’ve deployed your Llama 2 models (including the chat variant) in AWS SageMaker, use them with PromptNode
by simply providing the inference endpoint name, your aws_profile_name
and aws_custom_attributes
👇
from haystack.nodes import PromptNode
prompt_node = PromptNode(
model_name_or_path="sagemaker-llama-2-endpoint-name",
model_kwargs={"aws_profile_name": "my_aws_profile_name",
"aws_custom_attributes":{"accept_eula": True}}
)
result = prompt_node("Berlin is the capital of")
print(result)
# or the Llama 2 chat model
prompt_node = PromptNode(
model_name_or_path="sagemaker-llama-2-chat-endpoint-name",
model_kwargs={"aws_profile_name": "my_aws_profile_name",
"aws_custom_attributes":{"accept_eula": True}}
)
chat_conversation = [[
{"role": "user", "content": "what is the recipe of mayonnaise?"},
]]
result = prompt_node(chat_conversation)
print(result)
For more details on model deployment, check out the documentation.
🎉 Now using transformers 4.31.0
With this release, Haystack depends on the latest version of the transformers
library, allowing support for Llama 2.
🚫 SklearnQueryClassifier deprecation
Starting from version 1.19, SklearnQueryClassifier
is being deprecated and will be removed from Haystack as of version 1.21. We recommend using the more powerful TransformersQueryClassifier
instead. See the announcement for more details.
What's Changed
Pipeline
- feat: globally disable progress bars by @ZanSara in #5207
- Add
cpu-remote-inference
Docker image by @vblagoje in #5225 - fix: Support isolated node eval in run_batch in Generators by @bogdankostic in #5291
- feat: support
OpenAI-Organization
for authentication by @anakin87 in #5292 - docs: Small documentation updates to dense.py by @sjrl in #5305
- test: Refactor some retriever tests into unit tests by @sjrl in #5306
- feat: Add support for meta fields that are lists when using embed_meta_fields by @sjrl in #5307
- refactor: Extract link retrieval from WebRetriever, introduce LinkContentFetcher by @vblagoje in #5227
- fix: update WebRetriever docstrings and default mode by @dfokina in #5352
- added hybrid search example by @nickprock in #5376
DocumentStores
- fix: Allow filtering on list fields in
InMemoryDocumentStore
with all operators by @bogdankostic in #5208 - Fix:
FAISSDocumentStore
- makewrite_documents
properly work in combination wupdate_embeddings
by @anakin87 in #5221 - bug: fix for pinecone not working for per document updates by @vblagoje in #5110
- fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests by @tstadel in #5113
- ci: Add unit test for Elasticsearch8 by @bogdankostic in #5300
- feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 by @bogdankostic in #5320
Documentation
- feat: BM25 retrieval for
MemoryDocumentStore
by @vblagoje in #5151 - fix: install
inference
in REST API tests by @ZanSara in #5252 - fix:
import_utils fetch_archive_from_http
- improve url parsing for fetching archive from http by @malte-aws in #5199 - fix: Improve robustness of get_task HF pipeline invocations by @MichelBartels in #5284
- feat: introduce
Store
protocol (v2) by @ZanSara in #5259 - fix: num_return_sequences should be less than num_beams, not top_k by @faaany in #5280
- Revert "fix: num_return_sequences should be less than num_beams, not top_k" by @julian-risch in #5434
- chore: deprecate
SklearnQueryClassifier
by @anakin87 in #5324 - fix: Run HFLocalInvocationLayer.supports even if inference packages are not installed by @MichelBartels in #5308
- fix: a small bug in StopWordsCriteria by @faaany in #5316
- chore: fix typo in base.py by @eltociear in #5356
- feat: extend
pipeline.add_component
to support stores by @ZanSara in #5261 - proposal: Add
RecentnessRanker
component by @elundaeva in #5289 - feat: Add embed_meta_fields to Ranker nodes by @sjrl in #5361
- feat: Recentness Ranker by @elundaeva in #5301
- feat: Update Anthropic Claude support with the latest models, new streaming API, context window sizes by @vblagoje in #5406
- feat: Enable Support for Meta LLama-2 Models in Amazon Sagemaker by @vblagoje in #5437
Other Changes
- fix: MultiLabel
to_json
works with Table Labels by @sjrl in #5257 - chore: Remove deprecated return_table_cell from conftest.py by @sjrl in #5264
- test: Update
test/others/test_utils.py
by @sjrl in #5270 - test: Adapt batch size in retriever-reader benchmarks by @bogdankostic in #5281
- fix: Add dependecies to build lxml successfully in base Docker image by @vblagoje in #5288
- Remove requests_cache in tests by @silvanocerza in #5285
- refactor: Simplify selection of Azure vs OpenAI invocation layers by @vblagoje in #5271
- feat: batch mode for
MemoryRetriever
(v2) by @ZanSara in #5287 - chore: Add support for hierarchical docs by @silvanocerza in #5278
- build: Add
elasticsearch7
andelasticsearch8
extra by @bogdankostic in #5296 - chore: Adapt import message for Elasticsearch7 by @bogdankostic in #5295
- ci: Add job for ES8 integration tests by @bogdankostic in #5297
- ci: Update labeler.yml to account for Elasticsearch changes by @bogdankostic in #5318
- create invocation-layers API reference page by @dfokina in #5262
- chore: pin
scikit-learn>=1.3.0
by @anakin87 in #5322 - feat: upgrade
canals
in preview by @ZanSara in #5344 - tests: Improve token limit tests for OpenAI PromptNode layer by @vblagoje in #5351
- style:
max_new_tokens
are set twice with the same value by @faaany in #5368 - test: Re-activate end-to-end tests workflow by @julian-risch in #5343
- docs: Pin
PyYAML
to5.3.1
by @bogdankostic in #5400 - build: upgrade transformers to v4.31.0 by @anakin87 in #5391
- fix: Error message about weight param in RecentnessRanker by @julian-risch in #5409
- docs: Add Elasticsearch to API config by @bogdankostic in #5422
- fix: Improve log warnings in REST API /health endpoint by @vblagoje in #5381
- build: Unpin mlflow, constraint dulwich and botocore by @silvanocerza in #5441
New Contributors
- @malte-aws made their first contribution in #5199
- @eltociear made their first contribution in #5356
- @elundaeva made their first contribution in #5289
- @nickprock made their first contribution in #5376
Full Changelog: v1.18.1...v1.19.0