⭐ Highlights
Expanding Haystack's LLM support with the new OpenAIEmbeddingEncoder
(#3356)
Now you can easily create document and query embeddings using large language models: if you have an OpenAI account, all you have to do is set the name of one of the supported models (ada
, babbage
, davinci
or curie
) and add your API key to the EmbeddingRetriever
component in your pipelines (see docs).
Multimodal retrieval is here! (#2891)
Multimodality with Haystack just made a big leap forward with the addition of MultiModalRetriever
: a Retriever that can handle different modalities for query and documents independently. Take it for a spin and experiment with new Document formats, like images. You can now use the same Retriever for text-to-image, text-to-table, and text-to-text retrieval but also image similarity, table similarity, and more! Feed your favorite multimodal model to MultiModalRetriever
and see it in action.
retriever = MultiModalRetriever(
document_store=InMemoryDocumentStore(embedding_dim=512),
query_embedding_model = "sentence-transformers/clip-ViT-B-32",
query_type="text",
document_embedding_models = {"image": "sentence-transformers/clip-ViT-B-32"}
)
Multi-platform Docker images
Starting with 1.10, we're making the deepset/haystack
images available for linux/amd64
and linux/arm64
.
⚠️ Breaking change in embed_queries
method (#3252)
We've changed the text
argument in the embed_queries
method for DensePassageRetriever
and EmbeddingRetriever
to queries
.
What's Changed
Breaking Changes
Pipeline
- fix: ONNX FARMReader model conversion is broken by @vblagoje in #3211
- bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node by @JeffRisberg in #3170
- fix: eval() with
add_isolated_node_eval=True
breaks if no node supports it by @tstadel in #3347 - feat: extract label aggregation by @tstadel in #3363
- feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever by @vblagoje in #3356
- fix: stable YAML schema generation by @ZanSara in #3388
- fix: Update how schema is ordered by @sjrl in #3399
- feat:
MultiModalRetriever
by @ZanSara in #2891
DocumentStores
- feat: FAISS in OpenSearch: Support HNSW for cosine by @tstadel in #3217
- feat: add support for Elasticsearch 7.16.2 by @masci in #3318
- refactor: remove dead code from
FAISSDocumentStore
by @anakin87 in #3372 - fix: allow same
vector_id
in different indexes for SQL-based Document stores by @anakin87 in #3383
UI / Demo
Documentation
- docs: Fix a docstring in ray.py by @tanertopal in #3282
Other Changes
- refactor: make
TransformersDocumentClassifier
output consistent between different types of classification by @anakin87 in #3224 - Classify pipeline's type based on its components by @vblagoje in #3132
- docs: sync Haystack API with Readme by @brandenchan in #3223
- fix: MostSimilarDocumentsPipeline doesn't have pipeline property by @vblagoje in #3265
- bug: make
ElasticSearchDocumentStore
usebatch_size
inget_documents_by_id
by @anakin87 in #3166 - refactor: better tests for
TransformersDocumentClassifier
by @anakin87 in #3270 - fix: AttributeError in TranslationWrapperPipeline by @nickchomey in #3290
- refactor: remove Inferencer multiprocessing by @vblagoje in #3283
- fix: opensearch script score with filters by @tstadel in #3321
- feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch by @JacdDev in #3301
- feat: add multi-platform Docker images by @masci in #3354
- fix: Added checks for DataParallel and WrappedDataParallel by @sjrl in #3366
- fix: QuestionGenerator generates wrong document questions for non-default
num_queries_per_doc
parameter by @vblagoje in #3381 - bug: Adds better way of checking
query
in BaseRetriever and Pipeline.run() by @ugm2 in #3304 - feat: Updated EntityExtractor to handle long texts and added better postprocessing by @sjrl in #3154
- docs: Add comment about the generation of no-answer samples in FARMReader training by @brandenchan in #3404
- feat: Speed up integration tests (nodes) by @sjrl in #3408
- fix: Fix the error of wrong page numbers when documents contain empty pages. by @brunnurs in #3330
- bug: change type of split_by to Literal including None by @julian-risch in #3389
- feat: Add exponential backoff decorator; apply it to OpenAI requests by @vblagoje in #3398
New Contributors
- @tanertopal made their first contribution in #3282
- @JeffRisberg made their first contribution in #3170
- @JacdDev made their first contribution in #3301
- @hsm207 made their first contribution in #3351
- @ugm2 made their first contribution in #3304
- @brunnurs made their first contribution in #3330
Full Changelog: v1.9.1...v1.10.0rc1