⭐ Highlights

PromptNode enhancements

PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!

Shaper

We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.

IVF and Product Quantization support for OpenSearchDocumentStore

We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore. You can train the IVF index by calling train_index method (same as in FAISSDocumentStore) or by setting ivf_train_size when initializing OpenSearchDocumentStore and take your search to the next level.

What's Changed

Breaking Changes

refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
build: cache nltk models into the docker image by @mayankjobanputra in #4118
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850

Pipeline

feat: add frontmatter to meta in MarkdownConverter by @TuanaCelik in #3953
fix: removing code block in MarkdownConverter by @TuanaCelik in #3960
feat: Add page range support to PDF converters. by @danielbichuetti in #3965
fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
feat: add Shaper by @ZanSara in #3880
fix: Event sending for RayPipeline crashing Haystack by @zoltan-fedor in #3971
fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
fix: make the crawler more robust on Windows by @anakin87 in #4049
fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
refactor: replace mutable default arguments by @julian-risch in #4070
feat: Support multiple RayPipelines by @zoltan-fedor in #4078
Remove double batching in retrieve_batch by @sjrl in #4014
style: Update black by @silvanocerza in #4101
fix: Fix TableTextRetriever for input consisting of tables only by @jackapbutler in #4048
fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
Docs: Fix code block formatting by @agnieszka-m in #4162
refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
feat: Add OpenAIError to retry mechanism by @sjrl in #4178

DocumentStores

refactor: use weaviate client to build BM25 query by @hsm207 in #3939
fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number by @sjrl in #3980
fix: Add inner query for mysql compatibility by @julian-risch in #4068
feat: add support for custom headers by @hsm207 in #4040
feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors by @anakin87 in #4113
refactor: complete the document stores test refactoring by @masci in #4125
feat: include testing facilities into haystack package by @masci in #4182

Documentation

Align with the docs install guide + correct lg by @agnieszka-m in #3950
docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
Docs: Update docstrings by @agnieszka-m in #4119
docs: Update Annotation Tool README.md by @bogdankostic in #4123
feat: Add model_kwargs option to PromptNode by @sjrl in #4151
fix: Remove logging statement of setting ID manually in Document by @bogdankostic in #4129
chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
feat: Implement run_batch for PromptNode by @sjrl in #4072

Other Changes

fix: add option to not override results by Shaper #4231
fix: Shaper store all outputs from function #4223
fix: allowing file-upload api to write files to disk #4221
fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
feat: add top_k to PromptNode #4159
feat: Add JsonConverter node #4130
feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
fix: change model in distillation test by @ZanSara in #3944
feat: Expose output_variable in PromptNode result, adjust unit tests by @vblagoje in #3892
fix: Fix type in FARMReader's save_to_remote by @bogdankostic in #3952
refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
fix: overwrite params with environment variables even if there are no params in the pipeline definition; make mypy ignore REST API tests by @anakin87 in #3930
Docs: Update ImageToText docstrings by @agnieszka-m in #3963
Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
ci: Add Docker images testing by @silvanocerza in #3943
feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
ci: Fix docker image testing on release by @silvanocerza in #3976
Fix: Fix quotation marks by @agnieszka-m in #3973
fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
Missing import for TransformersImageToText by @ZanSara in #3984
test: CI on py3.8 by @ZanSara in #3926
Simplifies and fix docker images tests on release by @silvanocerza in #3982
feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore by @bogdankostic in #3969
ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
fix: extend schema for prompt node results by @tstadel in #3891
proposal: TableCell by @sjrl in #3875
refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
ci: Automate release on PyPi by @silvanocerza in #4015
ci: Fix PyPi release workflow by @silvanocerza in #4029
ci: Bump act10ns/slack from v1 to v2 by @silvanocerza in #4031
ci: latest version of pylint is failing, ignore new errors by @masci in #4035
ci: Add linting of workflow and related pre-commit hook by @silvanocerza in #4032
ci: Fix pylint version to prevent crash by @silvanocerza in #4043
ci: Add missing env var in PyPi release slack notification by @silvanocerza in #4052
fix: allow Biadaptive & Triadaptive to work with EarlyStopping by @jackapbutler in #4033
proposal: Add Agents for extended LLM support by @julian-risch in #3925
ci: Fix pylint workflow check running on tests files by @silvanocerza in #4076
fix: Add PromptTemplate repr method by @vblagoje in #4058
ci: Change actionlint pre-commit hook to use Dockerized tool by @silvanocerza in #4060
ci: Make tests run conditionally in CI by @silvanocerza in #4086
feat: OpenAI - warn users if max_tokens is too short by @anakin87 in #4094
Docs: Add shaper to api docs by @agnieszka-m in #4083
feat: Update allowed models to be used with Prompt Node by @sjrl in #4018
ci: Add missing env vars in rest_api CI tests by @silvanocerza in #4098
ci: Fix pylint CI check running with no files by @silvanocerza in #4097
Proposal: Add a JsonConverter node by @bglearning in #3959
fix: query filters in REST API by @oryx1729 in #4105
fix: fix torchaudio version by @mayankjobanputra in #4102
ci: Exclude .github folder from triggering tests by @silvanocerza in #4120
ci: Add workflow to label PRs that edit docstrings by @silvanocerza in #4115
Update PromptTemplate unit tests by @vblagoje in #4131
ci: Add load arg to docker/bake-action before testing Docker images by @silvanocerza in #4124
Revert changes introduced in PR #4124 by @silvanocerza in #4137
ci: Fix Docker images test on release by @silvanocerza in #4153
ci: Update docstring-labeler.yml workflow to safely run in PRs from forks by @silvanocerza in #4146
Docs: Add filter to hide entity post processor by @agnieszka-m in #4160
ci: Use larger runner for Docker release workflow by @silvanocerza in #4185
fix: make all OpenAI API params in PromptNode and PromptModel controllable via model_kwargs by @tstadel in #4183

New Contributors

@jackapbutler made their first contribution in #4033

Full Changelog: v1.13.2...v1.14.0

deepset-ai/haystack v1.14.0 on GitHub