This patch release fixes some small bugs, such as related to loading CLIP models, automatic model card generation issues, and ensuring compatibility with third party libraries.
Install this version with
# Training + Inference
pip install sentence-transformers[train]==3.2.1
# Inference only, use one of:
pip install sentence-transformers==3.2.1
pip install sentence-transformers[onnx-gpu]==3.2.1
pip install sentence-transformers[onnx]==3.2.1
pip install sentence-transformers[openvino]==3.2.1
Fixing Loading non-Transformer models
In v3.2.0, a non-Transformer based model (e.g. CLIP) would not load correctly if the model was saved in the root of the model repository/directory. This has been resolved in #3007.
Throw error if StaticEmbedding
-based model is finetuned with incompatible losses
The following losses are not compatible with StaticEmbedding
-based models:
- CachedGISTEmbedLoss
- CachedMultipleNegativesRankingLoss
- CachedMultipleNegativesSymmetricRankingLoss
- DenoisingAutoEncoderLoss
- GISTEmbedLoss
An error is now thrown when one of these are used with a StaticEmbedding
-based model. I recommend using MultipleNegativesRankingLoss to finetune these models, e.g. as in https://huggingface.co/tomaarsen/static-bert-uncased-gooaq.
Note: to get good performance, you must use much higher learning rates than otherwise. In my experiments, 2e-1 worked well.
Patch ONNX model when the model uses output_hidden_states
For example, this script used to fail, but passes now:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"distiluse-base-multilingual-cased",
backend="onnx",
model_kwargs={"provider": "CPUExecutionProvider"},
)
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape)
All changes
- Bump optimum version by @echarlaix in #2984
- [
docs
] Update the training snippets for some losses that should use the v3 Trainer by @tomaarsen in #2987 - [
enh
] Throw error if StaticEmbedding-based model is trained with incompatible loss by @tomaarsen in #2990 - [
fix
] Fix semantic_search_usearch with 'binary' by @tomaarsen in #2989 - [enh] Add support for large_string in model card create by @yaohwang in #2999
- [
model cards
] Prevent crash on generating widgets if dataset column is empty by @tomaarsen in #2997 - [fix] Added model2vec import compatible with current and newer version by @Pringled in #2992
- Fix cache_dir issue with loading CLIPModel by @BoPeng in #3007
- [
warn
] Throw a warning if compute_metrics is set, as it's not used by @tomaarsen in #3002 - [
fix
] Prevent IndexError if output_hidden_states & ONNX by @tomaarsen in #3008
New Contributors
- @echarlaix made their first contribution in #2984
- @yaohwang made their first contribution in #2999
- @Pringled made their first contribution in #2992
- @BoPeng made their first contribution in #3007
Full Changelog: v3.2.0...v3.2.1