⭐ Highlights
PromptNode enhancements
PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!
Shaper
We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.
IVF and Product Quantization support for OpenSearchDocumentStore
We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore
. You can train the IVF index by calling train_index
method (same as in FAISSDocumentStore
) or by setting ivf_train_size
when initializing OpenSearchDocumentStore
and take your search to the next level.
What's Changed
Breaking Changes
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
- feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
- feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
- build: cache nltk models into the docker image by @mayankjobanputra in #4118
- feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850
Pipeline
- feat: add frontmatter to meta in
MarkdownConverter
by @TuanaCelik in #3953 - fix: removing code block in
MarkdownConverter
by @TuanaCelik in #3960 - feat: Add page range support to PDF converters. by @danielbichuetti in #3965
- fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
- feat: add
Shaper
by @ZanSara in #3880 - fix: Event sending for
RayPipeline
crashing Haystack by @zoltan-fedor in #3971 - fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
- fix: make the crawler more robust on Windows by @anakin87 in #4049
- fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
- feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
- refactor: replace mutable default arguments by @julian-risch in #4070
- feat: Support multiple
RayPipelines
by @zoltan-fedor in #4078 - Remove double batching in retrieve_batch by @sjrl in #4014
- style: Update black by @silvanocerza in #4101
- fix: Fix
TableTextRetriever
for input consisting of tables only by @jackapbutler in #4048 - fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
- Docs: Fix code block formatting by @agnieszka-m in #4162
- refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
- fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
- feat: Add OpenAIError to retry mechanism by @sjrl in #4178
DocumentStores
- refactor: use weaviate client to build BM25 query by @hsm207 in #3939
- fix: fixed
InMemoryDocumentStore.get_embedding_count
to return correct number by @sjrl in #3980 - fix: Add inner query for mysql compatibility by @julian-risch in #4068
- feat: add support for custom headers by @hsm207 in #4040
- feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
- refactor:
InMemoryDocumentStore
- manage documents without embedding & fix mypy errors by @anakin87 in #4113 - refactor: complete the document stores test refactoring by @masci in #4125
- feat: include testing facilities into haystack package by @masci in #4182
Documentation
- Align with the docs install guide + correct lg by @agnieszka-m in #3950
- docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
- Docs: Update docstrings by @agnieszka-m in #4119
- docs: Update Annotation Tool README.md by @bogdankostic in #4123
- feat: Add model_kwargs option to PromptNode by @sjrl in #4151
- fix: Remove logging statement of setting ID manually in
Document
by @bogdankostic in #4129 - chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
- chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
- feat: Implement
run_batch
for PromptNode by @sjrl in #4072
Other Changes
- fix: add option to not override results by Shaper #4231
- fix: Shaper store all outputs from function #4223
- fix: allowing file-upload api to write files to disk #4221
- fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
- feat: add top_k to PromptNode #4159
- feat: Add JsonConverter node #4130
- feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
- fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
- fix: change model in distillation test by @ZanSara in #3944
- feat: Expose
output_variable
in PromptNode result, adjust unit tests by @vblagoje in #3892 - fix: Fix type in
FARMReader
'ssave_to_remote
by @bogdankostic in #3952 - refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
- ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
- fix: overwrite params with environment variables even if there are no params in the pipeline definition; make
mypy
ignore REST API tests by @anakin87 in #3930 - Docs: Update ImageToText docstrings by @agnieszka-m in #3963
- Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
- ci: Add Docker images testing by @silvanocerza in #3943
- feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
- ci: Fix docker image testing on release by @silvanocerza in #3976
- Fix: Fix quotation marks by @agnieszka-m in #3973
- fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
- chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
- Missing import for
TransformersImageToText
by @ZanSara in #3984 - test: CI on py3.8 by @ZanSara in #3926
- Simplifies and fix docker images tests on release by @silvanocerza in #3982
- feat: Add
use_prefiltering
parameter toDeepsetCloudDocumentStore
by @bogdankostic in #3969 - ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
- fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
- fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
- fix: extend schema for prompt node results by @tstadel in #3891
- proposal: TableCell by @sjrl in #3875
- refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
- ci: Automate release on PyPi by @silvanocerza in #4015
- ci: Fix PyPi release workflow by @silvanocerza in #4029
- ci: Bump act10ns/slack from v1 to v2 by @silvanocerza in #4031
- ci: latest version of pylint is failing, ignore new errors by @masci in #4035
- ci: Add linting of workflow and related pre-commit hook by @silvanocerza in #4032
- ci: Fix pylint version to prevent crash by @silvanocerza in #4043
- ci: Add missing env var in PyPi release slack notification by @silvanocerza in #4052
- fix: allow Biadaptive & Triadaptive to work with EarlyStopping by @jackapbutler in #4033
- proposal: Add Agents for extended LLM support by @julian-risch in #3925
- ci: Fix pylint workflow check running on tests files by @silvanocerza in #4076
- fix: Add PromptTemplate repr method by @vblagoje in #4058
- ci: Change actionlint pre-commit hook to use Dockerized tool by @silvanocerza in #4060
- ci: Make tests run conditionally in CI by @silvanocerza in #4086
- feat: OpenAI - warn users if
max_tokens
is too short by @anakin87 in #4094 - Docs: Add shaper to api docs by @agnieszka-m in #4083
- feat: Update allowed models to be used with Prompt Node by @sjrl in #4018
- ci: Add missing env vars in rest_api CI tests by @silvanocerza in #4098
- ci: Fix pylint CI check running with no files by @silvanocerza in #4097
- Proposal: Add a JsonConverter node by @bglearning in #3959
- fix: query filters in REST API by @oryx1729 in #4105
- fix: fix torchaudio version by @mayankjobanputra in #4102
- ci: Exclude .github folder from triggering tests by @silvanocerza in #4120
- ci: Add workflow to label PRs that edit docstrings by @silvanocerza in #4115
- Update PromptTemplate unit tests by @vblagoje in #4131
- ci: Add load arg to docker/bake-action before testing Docker images by @silvanocerza in #4124
- Revert changes introduced in PR #4124 by @silvanocerza in #4137
- ci: Fix Docker images test on release by @silvanocerza in #4153
- ci: Update docstring-labeler.yml workflow to safely run in PRs from forks by @silvanocerza in #4146
- Docs: Add filter to hide entity post processor by @agnieszka-m in #4160
- ci: Use larger runner for Docker release workflow by @silvanocerza in #4185
- fix: make all OpenAI API params in PromptNode and PromptModel controllable via model_kwargs by @tstadel in #4183
New Contributors
- @jackapbutler made their first contribution in #4033
Full Changelog: v1.13.2...v1.14.0