This patch release updates some logic for maximum sequence lengths, typing issues, FSDP training, and distributed training device placement.
Install this version with
# Training + Inference
pip install sentence-transformers[train]==4.0.2
# Inference only, use one of:
pip install sentence-transformers==4.0.2
pip install sentence-transformers[onnx-gpu]==4.0.2
pip install sentence-transformers[onnx]==4.0.2
pip install sentence-transformers[openvino]==4.0.2
Safer CrossEncoder (reranker) maximum sequence length
When loading CrossEncoder
models, we now rely on the minimum of the tokenizer model_max_length
and the config max_position_embeddings
(if they exist), rather than only relying on the latter if it exists. This previously resulted in the maximum sequence length of BAAI/bge-reranker-base being 514, whereas it can only handle sequences up to 512 tokens.
from sentence_transformers import CrossEncoder
model = CrossEncoder("BAAI/bge-reranker-base")
print(model.max_length)
# => 512
# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
"Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
"In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [0.99953485 0.01062613]
# Or test really long inputs to ensure that there's no crash:
score = model.predict([["one " * 1000, "two " * 1000]])
print(score)
# => [0.95482624]
Note that you can use the activation_fn
option with torch.nn.Identity()
to avoid the default Sigmoid that maps everything to [0, 1]:
from sentence_transformers import CrossEncoder
import torch
model = CrossEncoder("BAAI/bge-reranker-base", activation_fn=torch.nn.Identity())
# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
"Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
"In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [ 7.672551 -4.5337563]
Default device placement (#3303)
By default, in a distributed training setup with multiple CUDA devices, the model is now placed on the CUDA device corresponding with that local rank. This should lower the VRAM usage on GPU 0 when performing distributed training.
Minor patches of note
- Resolved typing issues for
SentenceTransformer
class outside of theencode
method. In v4.0.1, it was possible to no longer get help from your IDE for e.g.model.similarity
, for example. (#3297) - Improve FSDP training compatibility by avoiding a faulty "only if model is wrapped"-check. Now, the wrapped model should always be laced in the
loss
class instance when required for FSDP training. (#3295)
All Changes
- [docs]: update examples by @emmanuel-ferdman in #3292
- Update htaccess, in-line comments were problematic by @tomaarsen in #3293
- [
docs
] Resolve more broken links throughout the docs by @tomaarsen in #3294 - [
docs
] Fix some broken docs redirects by @tomaarsen in #3296 - [
typing
] Move encode typings back to .py from .pyi by @tomaarsen in #3297 - [
fix
] Avoid "Only if model is wrapped" check which is faulty for FSDP by @tomaarsen in #3295 - [
cross-encoder
] Set the tokenizer model_max_length to the min. of model_max_length & max_pos_embeds by @tomaarsen in #3304 - [
ci
] Attempt to fix CI by @tomaarsen in #3305 - Fix device assignment in
get_device_name
for distributed training by @uminaty in #3303 - [
docs
] Add missing docstring for push_to_hub by @tomaarsen in #3306 - [
docs
] Specify that exported ONNX/OpenVINO models don't include pooling/normalization by @tomaarsen in #3307
New Contributors
- @emmanuel-ferdman made their first contribution in #3292
- @uminaty made their first contribution in #3303
Full Changelog: v4.0.1...v4.0.2