github deepset-ai/haystack v1.9.0

latest releases: v2.5.1, v2.5.1-rc2, v2.5.1-rc1...
2 years ago

⭐ Highlights

Haystack 1.9 comes with nice performance improvements and two important pieces of news about its ecosystem. Let's see it in more detail!

Logging speed set to ludicrous (#3212)

This feature alone makes Haystack 1.9 worth testing out, just sayin'... We switched from f-strings to the string formatting operator when composing a log message, observing an astonishing speed of up to 120% in some pipelines.

Tutorials moved out! (#3244)

They grow up so fast! Tutorials now have their own git repository, CI, and release cycle, making it easier than ever to contribute ideas, fixes, and bug reports. Have a look at the tutorials repo, Star it, and open an issue if you have an idea for a new tutorial!

Docker pull deepset/haystack (#3162)

A new Docker image is ready to be pulled shipping Haystack 1.9, providing different flavors and versions that you can specify with the proper Docker tag - have a look at the README.
On this occasion, we also revamped the build process so that it's now using bake, while the older images are deprecated (see below).

⚠️ Deprecation notice

With the release of the new Docker image deepset/haystack, the following images are now deprecated and won't be updated any more starting with Haystack 1.10:

New Documentation Site and Haystack Website Revamp:

The Haystack website is going through a make-over to become a developer portal that surrounds Haystack and NLP topics beyond pure documentation. With that, we've published our new documentation site. From now on, content surrounding pure developer documentation will live under Haystack Documentation, while the Haystack website becomes a place for the community with tutorials, learning material and soon, a place where the community can share their own content too.

What's Changed

Pipeline

  • feat: standardize devices parameter and device initialization by @vblagoje in #3062
  • fix: Reduce GPU to CPU copies at inference by @sjrl in #3127
  • test: lower low boundary for accuracy in test_calculate_context_similarity_on_non_matching_contexts by @ZanSara in #3199
  • bug: fix pdftotext installation verification by @banjocustard in #3233
  • chore: remove f-strings from logs for performance reasons by @ZanSara in #3212
  • bug: reactivate benchmarks with quick fixes by @tholor in #2766

Models

  • fix: Replace multiprocessing tokenization with batched fast tokenization by @vblagoje in #3089

DocumentStores

  • bug: OpensearchDocumentStore.custom_mapping should accept JSON strings at validation by @ZanSara in #3065
  • feat: Add warnings to PineconeDocumentStore about indexing metadata if filters return no documents by @Namoush in #3086
  • bug: validate custom_mapping as an object by @ZanSara in #3189

Tutorials

  • docs: Fix the word length splitting; should be set to 100 not 1,000 by @stevenhaley in #3133
  • chore: remove tutorials from the repo by @masci in #3244

Other Changes

  • chore: Upgrade and pin transformers to 4.21.2 by @vblagoje in #3098
  • bug: adapt UI random question for streamlit 1.12 and pin to streamlit>=1.9.0 by @anakin87 in #3121
  • build: pin pydantic to 1.9.2 by @masci in #3126
  • fix: document FARMReader.train() evaluation report log level by @brandenchan in #3129
  • feat: add a security policy for Haystack by @masci in #3130
  • refactor: update dependencies and remove pins by @danielbichuetti in #3147
  • refactor: update package strategy in rest_api by @masci in #3148
  • fix: give default index for torch.device('cuda') in initialize_device_settings by @sjrl in #3161
  • fix: add type hints to all component init constructor parameters by @vblagoje in #3152
  • fix: Add 15 min timeout for downloading cached HF models by @vblagoje in #3179
  • fix: replace torch.device("cuda") with torch.device("cuda:0") in devices initialization by @vblagoje in #3184
  • feat: add health check endpoint to rest api by @danielbichuetti in #3168
  • refactor: improve support for dataclasses by @danielbichuetti in #3142
  • feat: Updates docs and types for language param in PreProcessor by @sjrl in #3186
  • feat: Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers by @bglearning in #3164
  • refactoring: reimplement Docker strategy by @masci in #3162
  • refactor: remove pre haystack-1.0 import paths support by @ZanSara in #3204
  • feat: exponential backoff with exp decreasing batch size for opensearch and elasticsearch client by @ArzelaAscoIi in #3194
  • feat: add public layout-base extraction support on PDFToTextConverter by @danielbichuetti in #3137
  • bug: fix embedding_dim mismatch in DocumentStore by @kalki7 in #3183
  • fix: update rest_api Docker Compose yamls for recent refactoring of rest_api by @nickchomey in #3197
  • chore: fix Windows CI by @masci in #3222
  • fix: type of temperature param and adjust defaults for OpenAIAnswerGenerator by @tholor in #3073
  • fix: handle Documents containing dataframes in Multilabel constructor by @masci in #3237
  • fix: make pydoc-markdown hook correctly resolve paths relative to repo root by @masci in #3238
  • fix: proper retrieval of answers for batch eval by @vblagoje in #3245
  • chore: updating colab links in older docs versions by @TuanaCelik in #3250
  • docs: establish API docs sync between v1.9.x and Readme by @brandenchan in #3266

New Contributors

Full Changelog: v1.8.0...v1.9.0

Don't miss a new haystack release

NewReleases is sending notifications on new releases.