github UKPLab/sentence-transformers v5.1.1
v5.1.1 - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

5 hours ago

This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==5.1.1

# Inference only, use one of:
pip install sentence-transformers==5.1.1
pip install sentence-transformers[onnx-gpu]==5.1.1
pip install sentence-transformers[onnx]==5.1.1
pip install sentence-transformers[openvino]==5.1.1

Error if unused kwargs is passed & get_model_kwargs (#3500)

Some SentenceTransformer or SparseEncoder models support custom model-specific keyword arguments, such as jinaai/jina-embeddings-v4. As of this release, calling model.encode with keyword arguments that aren't used by the model will result in an error.

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("all-MiniLM-L6-v2")
>>> model.encode("Who is Amelia Earhart?", normalize=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[sic]/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "[sic]/SentenceTransformer.py", line 983, in encode
    raise ValueError(
ValueError: SentenceTransformer.encode() has been called with additional keyword arguments that this model does not use: ['normalize']. As per SentenceTransformer.get_model_kwargs(), this model does not accept any additional keyword arguments.

Quite useful when you, for example, accidentally forget that the parameter to get normalized embeddings is normalize_embeddings. Prior to this version, this parameter would simply quietly be ignored.

To check which custom extra keyword arguments may be used for your model, you can call the new get_model_kwargs method:

>>> from sentence_transformers import SentenceTransformer, SparseEncoder
>>> SentenceTransformer("all-MiniLM-L6-v2").get_model_kwargs()
[]
>>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs()
['task', 'truncate_dim']
>>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs()
['task']

Note: You can always pass the task parameter, it's the only model-specific parameter that will be quietly ignored. This means that you can always use model.encode(..., task="query") and model.encode(..., task="document").

Minor Features

Minor Fixes

  • Fix batch_size being ignored in CrossEncoderRerankingEvaluator (#3497)
  • Fix multi-GPU processing with encode, embeddings are now moved from the various devices to the CPU before being stacked into one tensor (#3488)
  • Use encode_query and encode_document in mine_hard_negatives, automatically using defined "query" and "document" prompts (#3502)
  • Fix "Path does not exist" errors when calling an Evaluator with a output_path that doesn't exist yet (#3516)
  • Fix the number of reported number of missing negatives in mine_hard_negatives (#3504)

All Changes

  • Docs Patch for AnglE and CoSENT Losses by @johneckberg in #3496
  • [fix] add batch size parameter to model prediction in CrossEncoderRerankingEvaluator by @emapco in #3497
  • Add FLOPS calculation and update metrics in SparseEvaluators by @arthurbr11 in #3456
  • [fix] Ensure multi-process embeddings are moved to CPU for concatenation by @tomaarsen in #3488
  • [model_card] Don't override manually provided languages in model card by @tomaarsen in #3501
  • [tests] Add hard negatives test showing multiple positives are correctly handled by @tomaarsen in #3503
  • [feat] Use encode_document and encode_query in mine_hard_negatives by @tomaarsen in #3502
  • Add Support for Knowledgeable Passage Retriever (KPR) by @ikuyamada in #3495
  • Update rasyosef/splade-mini MSMARCO and BEIR-13 benchmark scores in pretrained_models.md by @rasyosef in #3508
  • always pass input_ids, attention_mask, token_type_ids, inputs_embeds to forward by @Samoed in #3509
  • [feat] add get_model_kwargs method; throw error if unused kwarg is passed by @tomaarsen in #3500
  • Fix:Import SentenceTransformer class explicitly in losses module by @altescy in #3521
  • fix: add makedirs to informationretrievalevaluator by @stephantul in #3516
  • [fix] Fix the number of missing negatives in mine_hard_negatives by @tomaarsen in #3504

New Contributors

Full Changelog: v5.1.0...v5.1.1

Don't miss a new sentence-transformers release

NewReleases is sending notifications on new releases.