Unsupervised Sentence Embedding Learning
This release integrates methods that allows to learn sentence embeddings without having labeled data:
- TSDAE: TSDAE is using a denoising auto-encoder to learn sentence embeddings. The method has been presented in our recent paper and achieves state-of-the-art performance for several tasks.
- GenQ: GenQ uses a pre-trained T5 system to generate queries for a given passage. It was presented in our recent BEIR paper and works well for domain adaptation for (semantic search)[https://www.sbert.net/examples/applications/semantic-search/README.html]
New Models - SentenceTransformer
- MSMARCO Dot-Product Models: We trained models using the dot-product instead of cosine similarity as similarity function. As shown in our recent BEIR paper, models with cosine-similarity prefer the retrieval of short documents, while models with dot-product prefer retrieval of longer documents. Now you can choose what is most suitable for your task.
- MSMARCO MiniLM Models: We uploaded some models based on MiniLM: It uses just 384 dimensions, is faster than previous models and achieves nearly the same performance
New Models - CrossEncoder
- MSMARCO Re-ranking-Models v2: We trained new significantly faster and significantly better CrossEncoder re-ranking models on the MSMARCO dataset. It outperforms BERT-large models in terms of accuracy while being 18 times faster. Trainingcode is available
New Features
- You can now pass to the CrossEncoder class a
default_activation_function
, that is applied on-top of the output logits generated by the class. - You can now pre-process images for the CLIP Model. Soon I will release a tutorial how to fine-tune the CLIP Model with your data.