UKPLab/sentence-transformers v5.1.1 on GitHub

This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==5.1.1

# Inference only, use one of:
pip install sentence-transformers==5.1.1
pip install sentence-transformers[onnx-gpu]==5.1.1
pip install sentence-transformers[onnx]==5.1.1
pip install sentence-transformers[openvino]==5.1.1

Error if unused kwargs is passed & `get_model_kwargs` (#3500)

Some SentenceTransformer or SparseEncoder models support custom model-specific keyword arguments, such as jinaai/jina-embeddings-v4. As of this release, calling model.encode with keyword arguments that aren't used by the model will result in an error.

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("all-MiniLM-L6-v2")
>>> model.encode("Who is Amelia Earhart?", normalize=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[sic]/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "[sic]/SentenceTransformer.py", line 983, in encode
    raise ValueError(
ValueError: SentenceTransformer.encode() has been called with additional keyword arguments that this model does not use: ['normalize']. As per SentenceTransformer.get_model_kwargs(), this model does not accept any additional keyword arguments.

Quite useful when you, for example, accidentally forget that the parameter to get normalized embeddings is normalize_embeddings. Prior to this version, this parameter would simply quietly be ignored.

To check which custom extra keyword arguments may be used for your model, you can call the new get_model_kwargs method:

>>> from sentence_transformers import SentenceTransformer, SparseEncoder
>>> SentenceTransformer("all-MiniLM-L6-v2").get_model_kwargs()
[]
>>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs()
['task', 'truncate_dim']
>>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs()
['task']

Note: You can always pass the task parameter, it's the only model-specific parameter that will be quietly ignored. This means that you can always use model.encode(..., task="query") and model.encode(..., task="document").

Minor Features

Add FLOPS calculations to SparseEncoder evaluators (#3456)
Add Support for Knowledgeable Passage Retriever (KPR) models (#3495)

Minor Fixes

Fix batch_size being ignored in CrossEncoderRerankingEvaluator (#3497)
Fix multi-GPU processing with encode, embeddings are now moved from the various devices to the CPU before being stacked into one tensor (#3488)
Use encode_query and encode_document in mine_hard_negatives, automatically using defined "query" and "document" prompts (#3502)
Fix "Path does not exist" errors when calling an Evaluator with a output_path that doesn't exist yet (#3516)
Fix the number of reported number of missing negatives in mine_hard_negatives (#3504)

All Changes

Docs Patch for AnglE and CoSENT Losses by @johneckberg in #3496
[fix] add batch size parameter to model prediction in CrossEncoderRerankingEvaluator by @emapco in #3497
Add FLOPS calculation and update metrics in SparseEvaluators by @arthurbr11 in #3456
[fix] Ensure multi-process embeddings are moved to CPU for concatenation by @tomaarsen in #3488
[model_card] Don't override manually provided languages in model card by @tomaarsen in #3501
[tests] Add hard negatives test showing multiple positives are correctly handled by @tomaarsen in #3503
[feat] Use encode_document and encode_query in mine_hard_negatives by @tomaarsen in #3502
Add Support for Knowledgeable Passage Retriever (KPR) by @ikuyamada in #3495
Update rasyosef/splade-mini MSMARCO and BEIR-13 benchmark scores in pretrained_models.md by @rasyosef in #3508
always pass input_ids, attention_mask, token_type_ids, inputs_embeds to forward by @Samoed in #3509
[feat] add get_model_kwargs method; throw error if unused kwarg is passed by @tomaarsen in #3500
Fix:Import SentenceTransformer class explicitly in losses module by @altescy in #3521
fix: add makedirs to informationretrievalevaluator by @stephantul in #3516
[fix] Fix the number of missing negatives in mine_hard_negatives by @tomaarsen in #3504

New Contributors

@ikuyamada made their first contribution in #3495
@rasyosef made their first contribution in #3508

Full Changelog: v5.1.0...v5.1.1

UKPLab/sentence-transformers v5.1.1 v5.1.1 - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative on GitHub

Error if unused kwargs is passed & get_model_kwargs (#3500)

Minor Features

Minor Fixes

All Changes

New Contributors

UKPLab/sentence-transformers v5.1.1
v5.1.1 - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

on GitHub

Error if unused kwargs is passed & `get_model_kwargs` (#3500)