github deepset-ai/haystack v1.13.0

latest releases: v2.5.1, v2.5.1-rc2, v2.5.1-rc1...
20 months ago

⭐ Highlights

Stop words for PromptNode

Implements stop words on the level of the PromptNode (for all models). Users can specify ‘stop_words’ as PromptNode list parameter, and thus stop LLM text generation once any of the stop words is encountered. Stop words will not be included in the response.

A dedicated Github repository for Haystack demo(s)

The source code for Haystack' Explore the World demo has been moved to a dedicated repository: https://github.com/deepset-ai/haystack-demos. Use this repository to check out the code, run it locally, fork, customize, and contribute!

New nodes: ImageToText and CsvTextConverter

This release sees two new nodes, both contributed by community members!

The first one is ImageToText (courtesy of our well-known @anakin87): an image captioning node that can generate description of image files and create Haystack documents from them.

The second one is CsvTextConverter, from @Benvii: a small utility node that can load a CSV of FAQ question-answer pairs and correctly send them to your DocumentStore, making it super handy for FAQ matching pipelines.

Check out the docs to know more about them and try them out!

Faster tokenization for GPT models with tiktoken

Haystack now supports faster tokenization with OpenAI's tiktoken library, which can dramatically improve tokenization speed for GPT models. For unsupported architectures (Py < 3.8, arm64 and MacOS) fallbacks are in place and regular HuggingFace tokenizers are used. Thanks to @danielbichuetti for yet another amazing contribution!

What's Changed

Breaking Changes

  • Migrating to use native Pytorch AMP by @sjrl in #2827
  • bug: consistent batch_size parameter names in distillation by @julian-risch in #3811
  • refactor: Move invocation_context from meta to own pipeline variable by @vblagoje in #3888

Pipeline

  • feat: Update cohere embedding models by @vblagoje in #3704
  • feat: add index parameter to TfidfRetriever by @anakin87 in #3666
  • feat: Use torch.inference_mode() for TableQA by @sjrl in #3731
  • feat: Enable text-embedding-ada-002 for EmbeddingRetriever by @vblagoje in #3721
  • refactor: improve monkey patch for SklearnQueryClassifier by @anakin87 in #3732
  • refactor: remove unused code in TfidfRetriever by @anakin87 in #3733
  • refactor: Remove duplicate code in TableReader by @sjrl in #3708
  • fix: Make InferenceProcessor thread safe by @bogdankostic in #3709
  • chore: adding template for prompt node by @TuanaCelik in #3738
  • fix: Fixed local reader model loading by @mayankjobanputra in #3663
  • fix: Fix predict_batch in TransformersReader for single nested Document list by @bogdankostic in #3748
  • feat: change PipelineConfigError to DocumentStoreError with more details by @julian-risch in #3783
  • bug: skip empty documents in reader by @julian-risch in #3773
  • fix: linefeeds in custom_query by @tstadel in #3813
  • fix: Convert table cells to strings for compatibility with TableReader by @sjrl in #3762
  • fix: Ensure eval mode for TableReader model for predictions by @sjrl in #3743
  • fix: gracefully handle FileExistsError during Preprocessor resource download by @wochinge in #3816
  • fix: make the crawler runnable and testable on Windows by @anakin87 in #3830
  • fix: ignore non-serializable params when hashing pipeline objects by @masci in #3842
  • feat: preprocessor raises warning when doc length exceeds threshold by @ZanSara in #3837
  • fix: remove string validation in YAML by @ZanSara in #3854
  • feat: Use truncate option for Cohere.embed by @sjrl in #3865
  • feat: ImageToText (caption generator) by @anakin87 in #3859
  • fix: Remove double super class init from ParsrConverter init by @silvanocerza in #3896
  • feat: store id_hash_keys in Document objects to make documents clonable by @ZanSara in #3697
  • feat: adding the ability to use Ray Serve async functionality by @zoltan-fedor in #3769
  • feat: support cl100k_base tokenization and increase performance for GPT2 by @danielbichuetti in #3897
  • fix: Fix number of concurrent requests in RequestLimiter by @bogdankostic in #3705
  • feat: Run commands inside docker container as a non root user by @vblagoje in #3702
  • fix: Removed overlooked torch scatter references by @sjrl in #3719
  • build: upgrade torch and let transformers pick the version by @julian-risch in #3727
  • feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate by @vblagoje in #3667
  • refactor: remove deprecated parameters from Summarizer by @anakin87 in #3740
  • refactor: Using with open() to read files by @sjrl in #3787
  • feat: Bump python to 3.10 for gpu docker image, use nvidia/cuda by @vblagoje in #3701
  • fix: pin protobuf version by @masci in #3789
  • fix(docker): Use IMAGE_NAME in api image by @FabianHertwig in #3786
  • bug: Fix launch_milvus() by cd'ing to milvus_dir by @t0r0id in #3795
  • refactor: Change PromptNode registered templates from per class to per instance by @vblagoje in #3810
  • bug: The PromptNode handles all parameters as lists without checking if they are in fact lists by @zoltan-fedor in #3820
  • feat: update the docker image for haystack-api service by @bilgeyucel in #3835
  • refactor: Simplify PromptTemplate substitution in PromptNode by @vblagoje in #3876
  • feat: PromptNode - implement stop words by @vblagoje in #3884
  • feat: Add retry with exponential back-off to PromptNode's OpenAI models by @vblagoje in #3886
  • chore: Add timeouts to external requests calls by @silvanocerza in #3895
  • feat: Add CsvTextConverter by @Benvii in #3587
  • refactor: Improve stop_words handling, add unit test cases by @vblagoje in #3918
  • refactor: Updated rest_api schema for tables to be consistent with Document.to_dict #3872

Models

DocumentStores

UI / Demo

  • refactor: remove haystack demo along with deprecated Dockerfiles by @masci in #3829

Documentation

Other Changes

  • proposal: Create a dedicated Github repository for Haystack demos by @masci in #3695
  • fix: build pdftotext from sources by @masci in #3746
  • fix: Trigger pipeline schema update on tagged releases by @askainet in #3752
  • ci: Add newline when generating OpenAPI specs by @bogdankostic in #3782
  • test: Improve robustness of PromptNode unit tests by @vblagoje in #3747
  • feat: utility function to explicitly invoke JSON schema generation by @masci in #3798
  • fix: prevent posthog from sending errors to stderr by @julian-risch in #4008

New Contributors

Full Changelog: v1.12.2...v1.13.0rc1

Don't miss a new haystack release

NewReleases is sending notifications on new releases.