github UKPLab/sentence-transformers v4.0.2
v4.0.2 - Safer reranker max sequence length logic, typing issues, FSDP & device placement

22 hours ago

This patch release updates some logic for maximum sequence lengths, typing issues, FSDP training, and distributed training device placement.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==4.0.2

# Inference only, use one of:
pip install sentence-transformers==4.0.2
pip install sentence-transformers[onnx-gpu]==4.0.2
pip install sentence-transformers[onnx]==4.0.2
pip install sentence-transformers[openvino]==4.0.2

Safer CrossEncoder (reranker) maximum sequence length

When loading CrossEncoder models, we now rely on the minimum of the tokenizer model_max_length and the config max_position_embeddings (if they exist), rather than only relying on the latter if it exists. This previously resulted in the maximum sequence length of BAAI/bge-reranker-base being 514, whereas it can only handle sequences up to 512 tokens.

from sentence_transformers import CrossEncoder

model = CrossEncoder("BAAI/bge-reranker-base")
print(model.max_length)
# => 512

# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
    "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
    "In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [0.99953485 0.01062613]

# Or test really long inputs to ensure that there's no crash:
score = model.predict([["one " * 1000, "two " * 1000]])
print(score)
# => [0.95482624]

Note that you can use the activation_fn option with torch.nn.Identity() to avoid the default Sigmoid that maps everything to [0, 1]:

from sentence_transformers import CrossEncoder
import torch

model = CrossEncoder("BAAI/bge-reranker-base", activation_fn=torch.nn.Identity())

# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
    "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
    "In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [ 7.672551  -4.5337563]

Default device placement (#3303)

By default, in a distributed training setup with multiple CUDA devices, the model is now placed on the CUDA device corresponding with that local rank. This should lower the VRAM usage on GPU 0 when performing distributed training.

Minor patches of note

  • Resolved typing issues for SentenceTransformer class outside of the encode method. In v4.0.1, it was possible to no longer get help from your IDE for e.g. model.similarity, for example. (#3297)
  • Improve FSDP training compatibility by avoiding a faulty "only if model is wrapped"-check. Now, the wrapped model should always be laced in the loss class instance when required for FSDP training. (#3295)

All Changes

  • [docs]: update examples by @emmanuel-ferdman in #3292
  • Update htaccess, in-line comments were problematic by @tomaarsen in #3293
  • [docs] Resolve more broken links throughout the docs by @tomaarsen in #3294
  • [docs] Fix some broken docs redirects by @tomaarsen in #3296
  • [typing] Move encode typings back to .py from .pyi by @tomaarsen in #3297
  • [fix] Avoid "Only if model is wrapped" check which is faulty for FSDP by @tomaarsen in #3295
  • [cross-encoder] Set the tokenizer model_max_length to the min. of model_max_length & max_pos_embeds by @tomaarsen in #3304
  • [ci] Attempt to fix CI by @tomaarsen in #3305
  • Fix device assignment in get_device_name for distributed training by @uminaty in #3303
  • [docs] Add missing docstring for push_to_hub by @tomaarsen in #3306
  • [docs] Specify that exported ONNX/OpenVINO models don't include pooling/normalization by @tomaarsen in #3307

New Contributors

Full Changelog: v4.0.1...v4.0.2

Don't miss a new sentence-transformers release

NewReleases is sending notifications on new releases.