github deepset-ai/haystack 0.2.1

latest releases: v2.1.2, v2.1.2-rc1, v2.1.2-rc0...
4 years ago

In our release notes, we will always highlight a few important changes first and list the detailed PRs below.

🎉 First release

Happy to announce our first proper release incl. many of the features that we found absolutely crucial for a QA system. While we still have countless exciting features on our roadmap, we are confident that this version already accelerates your development phase of QA systems significantly and it was also tested successfully in the first production deployments.
From now on, we will switch to a more regular release cycle.

📜 ElasticsearchDocumentStore

We recommend the new ElasticsearchDocumentStore for all production deployments. While we will keep more light-weight options (SQL, In-Memory) for easy prototyping, new features will be implemented first for Elasticsearch.

🚀 New Retrievers

Beside plain TF-IDF (in memory), we introduced the ElasticsearchRetriever that supports Elasticsearch's native scoring (BM25) or custom queries (e.g. using boosting).

As a further option, we also added the EmbeddingRetriever that encodes texts into embeddings (e.g. via Sentence-BERT) and retrieves via cosine-similarity. Especially the latter is very promising and you will likely see more features in this direction.

⁉️ FAQ-style QA

Beside extractive QA, you can now also index existing question-answer pairs (e.g. from FAQs) and find answers via matching the incoming user-question with the indexed questions and returning the related answer from that pair. This can be an interesting alternative or addition to extractive QA, if you already have huge collections of FAQs and/or need a solution that works with low computational resources.

🔁 Modular API based on FastAPI

We changed the basic REST API from Flask to FastAPI and modularized it.
You can now:

  • search answers in texts (extractive QA)
  • search answers by comparing user question to existing questions (FAQ-style QA)
  • collect & export user feedback on answers to gain domain-specific training data (feedback)
  • do basic monitoring of requests (currently via APM in Kibana)

Detailed changes:

Document Stores

  • Add Elasticsearch Datastore #13
  • Refactor database layer #10
  • Add test for Elasticsearch document store #88
  • Make filters optional for Elasticsearch query #80
  • Inmemory store #76
  • Fix get_all_documents() in ElasticsearchDocumentStore #77
  • Fix get_all_documents query for Elasticsearch #21

Retrievers

  • Add FAQ-style QA #44
  • added option for custom elasticsearch queries and filters #52
  • More flexbile es config & support for filters #29
  • Add more ES connection params #35
  • Simplify Retriever query #73
  • Refactor ElasticsearchRetriever into separate class #72
  • Add params to create_embeddings in retriever #45
  • fix scaling of pseudo probs for es scores. fix filtering of embedding retrieval #46
  • Fixing doc_name for TFIDF Retriever #33

Readers

  • Refactor pipeline for better generalizability & Add TransformersReader #1
  • Add method to train a reader on custom data #5
  • Add no answer handling #26
  • Add no_answer option to results #24
  • Fix offsets in reader #4
  • FARMReader.train() now takes default values from FARMReader #47
  • Update inferencer args (num_processes, chunksize) to latest FARM version #54
  • update readme & rename arg in TransformersReader for consistency #86
  • Fixing typo in transformer. use_gpu provides ordinal of the gpu, not … #83
  • Add document_id with Transformers Reader #60
  • Make eval during reader.train() more verbose #28
  • Removed "document_name" from farm.py #31
  • Add a document_name field in answers #30

REST API / Deployment

  • Move API from flask to fastAPI #3
  • Modularize API components #55
  • Return more meta data & restructure reponse format #66
  • Log API responses in APM #70
  • Make Elastic-APM optional #65
  • Update Python version in Dockerfile-GPU #71
  • Update Dockerfiles to use Gunicorn for deployment #69
  • Add limit on concurrent requests for doc-qa #64
  • Add Docker Images for running Haystack #85
  • Fix cyclic import of Elasticsearch client #59
  • Add Feedback export API #56
  • Add gpu dockerfile, improve logging, fix minor bug with filtering #36
  • Improve deployment of REST API (Configs, logging, minor bugs) #40

Others

  • Standardize Finder, Readers, and Retriever interfaces #62
  • pin haystack version in tutorials until release #87
  • Update tutorials to use Elasticsearch, new Retrievers #79
  • Adding coverage reports and a few more tests #78
  • Added Jupyter notebooks of Tutorials #43
  • Add minimal tutorial for ES #19
  • Update tutorials #12

Thanks to all contributors for your great work 👏
@tanaysoni, @Timoeller , @brandenchan, @bogdankostic , @skirdey , @stedomedo , @karthik19967829 , @aadil-srivastava01 , @tholor

Don't miss a new haystack release

NewReleases is sending notifications on new releases.