⭐ Highlights

Expanding Haystack's LLM support with the new `OpenAIEmbeddingEncoder` (#3356)

Now you can easily create document and query embeddings using large language models: if you have an OpenAI account, all you have to do is set the name of one of the supported models (ada, babbage, davinci or curie) and add your API key to the EmbeddedRetriver component in your pipelines.

Multimodal retrieval is here! (#2891)

Multimodality with Haystack just made a big leap forward with the addition of MultiModalRetriever: a Retriever that can handle different modalities for query and documents independently. Take it for a spin and experiment with new Document formats, like images. You can now use the same Retriever for text-to-image, text-to-table, and text-to-text retrieval but also image similarity, table similarity, and more! Feed your favorite multimodal model to MultiModalRetriever and see it in action.

retriever = MultiModalRetriever(
    document_store=InMemoryDocumentStore(embedding_dim=512),
    query_embedding_model = "sentence-transformers/clip-ViT-B-32",
    query_type="text",
    document_embedding_models = {"image": "sentence-transformers/clip-ViT-B-32"}
)

Multi-platform Docker images

Starting with 1.10, we're making the deepset/haystack images available for linux/amd64 and linux/arm64.

⚠️ Breaking change in `embed_queries` method (#3252)

We've changed the text argument in the embed_queries method for DensePassageRetriever and EmbeddingRetriever to queries.

What's Changed

Breaking Changes

chore: add DenseRetriever abstraction by @tstadel in #3252

Pipeline

fix: ONNX FARMReader model conversion is broken by @vblagoje in #3211
bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node by @JeffRisberg in #3170
fix: eval() with add_isolated_node_eval=True breaks if no node supports it by @tstadel in #3347
feat: extract label aggregation by @tstadel in #3363
feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3356
fix: stable YAML schema generation by @ZanSara in #3388
fix: Update how schema is ordered by @sjrl in #3399
feat: MultiModalRetriever by @ZanSara in #2891

DocumentStores

feat: FAISS in OpenSearch: Support HNSW for cosine by @tstadel in #3217
feat: add support for Elasticsearch 7.16.2 by @masci in #3318
refactor: remove dead code from FAISSDocumentStore by @anakin87 in #3372
fix: allow same vector_id in different indexes for SQL-based Document stores by @anakin87 in #3383

UI / Demo

fix: demo won't start through Docker compose on Apple M1 by @masci in #3337

Documentation

docs: Fix a docstring in ray.py by @tanertopal in #3282

Other Changes

refactor: make TransformersDocumentClassifier output consistent between different types of classification by @anakin87 in #3224
Classify pipeline's type based on its components by @vblagoje in #3132
docs: sync Haystack API with Readme by @brandenchan in #3223
fix: MostSimilarDocumentsPipeline doesn't have pipeline property by @vblagoje in #3265
bug: make ElasticSearchDocumentStore use batch_size in get_documents_by_id by @anakin87 in #3166
refactor: better tests for TransformersDocumentClassifier by @anakin87 in #3270
fix: AttributeError in TranslationWrapperPipeline by @nickchomey in #3290
refactor: remove Inferencer multiprocessing by @vblagoje in #3283
fix: opensearch script score with filters by @tstadel in #3321
feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch by @JacdDev in #3301
feat: add multi-platform Docker images by @masci in #3354
fix: Added checks for DataParallel and WrappedDataParallel by @sjrl in #3366
fix: QuestionGenerator generates wrong document questions for non-default num_queries_per_doc parameter by @vblagoje in #3381
bug: Adds better way of checking query in BaseRetriever and Pipeline.run() by @ugm2 in #3304
feat: Updated EntityExtractor to handle long texts and added better postprocessing by @sjrl in #3154
docs: Add comment about the generation of no-answer samples in FARMReader training by @brandenchan in #3404
feat: Speed up integration tests (nodes) by @sjrl in #3408
fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
bug: change type of split_by to Literal including None by @julian-risch in #3389
feat: Add exponential backoff decorator; apply it to OpenAI requests by @vblagoje in #3398

New Contributors

@tanertopal made their first contribution in #3282
@JeffRisberg made their first contribution in #3170
@JacdDev made their first contribution in #3301
@hsm207 made their first contribution in #3351
@ugm2 made their first contribution in #3304
@brunnurs made their first contribution in #3330

Full Changelog: v1.9.1...v1.10.0rc1

deepset-ai/haystack v1.10.0rc1 on GitHub