UKPLab/sentence-transformers v0.3.8 on GitHub

Add support training and using CrossEncoder
Data Augmentation method AugSBERT added
New model trained on large scale paraphrase data. Models works on internal benchmark much better than previous models: distilroberta-base-paraphrase-v1 and xlm-r-distilroberta-base-paraphrase-v1
New model for Information Retrieval trained on MS Marco: distilroberta-base-msmarco-v1
Improved MultipleNegativesRankingLoss loss function: Similarity function can be changed and is now cosine similarity (was dot-product before), further, similarity scores can be multiplied by a scaling factor. This allows the usage of NTXentLoss / InfoNCE loss.
New MegaBatchMarginLoss, inspired from the paper ParaNMT-Paper.

Smaller changes:

Update InformationRetrievalEvaluator, so that it can work with large corpora (Millions of entries). Removed the query_chunk_size parameter from the evaluator
SentenceTransformer.encode method detaches tensors from compute graph
SentenceTransformer.fit() method - Parameter output_path_ignore_not_empty deprecated. No longer checks that target folder must be empty

UKPLab/sentence-transformers v0.3.8 v0.3.8 - CrossEncoder, Data Augmentation, new Models on GitHub