This is a smaller release with some new features
MarginMSELoss
MarginMSELoss is a great method to train embeddings model with the help of a cross-encoder model. The details are explained here: MSMARCO - MarginMSE Training
You pass your training data in the format:
InputExample(texts=[query, positive, negative], label=cross_encoder.predict([query, positive])-cross_encoder.predict([query, negative])
MultipleNegativesSymmetricRankingLoss
MultipleNegativesRankingLoss computes the loss just in one way: Find the correct answer for a given question.
MultipleNegativesSymmetricRankingLoss also computes the loss in the other direction: Find the correct question for a given answer.
Breaking Change: CLIPModel
The CLIPModel is now based on the transformers
model.
You can still load it like this:
model = SentenceTransformer('clip-ViT-B-32')
Older SentenceTransformers versions are now longer able to load and use the 'clip-ViT-B-32' model.
Added files on the hub are automatically downloaded
PR #1116 checks if you have all files in your local cache or if there are added files on the hub. If this is the case, it will automatically download them.
SentenceTransformers.encode()
can return all values
When you set output_value=None
for the encode
method, all values (token_ids, token_embeddings, sentence_embedding) will be returned.