Release Notes
Highlights
πͺ¨ Amazon Bedrock supports new embedding models (#6406)
You can now use Titan and Cohere embedding models in your pipelines via the Amazon Bedrock integration.
from haystack.nodes import EmbeddingRetriever
retriever = EmbeddingRetriever(
embedding_model="amazon.titan-embed-text-v1",
document_store=document_store,
aws_config = {"aws_access_key_id": "ACCESS_KEY",
"aws_secret_access_key": "SECRET_KEY",
"aws_session_token": "SESSION_TOKEN"})
πΈοΈ Use any WebDriver you want in Crawler (#5465)
The WebDriver
that powers Haystack's crawler is no longer limited to Chrome.
Now you can configure it to use whatever WebDriver
you'd like.
See our Crawler docs for more info.
v1.24.0
π New Features
- Adding Bedrock Embeddings Encoder to use as a retriever.
- Add an optional webdriver parameter to Crawler. This allows using a pre-configured custom webdriver instead of creating the default Chrome webdriver.
β‘οΈ Enhancement Notes
- Add model_kwargs to FARMReader to allow loading in fp16 at inference time
- Make JoinDocuments sensitive to weights parameter when join_mode is reciprocal rank fusion. Add score normalization for JoinDocuments when join_mode is reciprocal rank fusion.
- Optimize documents upsert in PineconeDocumentStore (write_documents) by enabling asynchronous requests.
- Add model_kwargs argument to SentenceTransformersRanker to be able to pass through HF transformers loading options
- Use batching in the predict method since multiple documents are usually passed at inference time. Allow the model to be loaded in torch.float16 by adding pipeline_kwargs to the init method
- Correctly calculate the max token limit for gpt-3.5-turbo-1106
π Bug Fixes
- Correctly calculate the answer page number for Extractive Answers
- Fixed a bug that caused the EmbeddingRetriever to return no documents when used with a MongoDBAtlasDocumentStore. MongoDBAtlasDocumentStore now accepts a vector_search_index parameter, which needs to be created before in the MongoDB Atlas Web UI following their documentation.