This patch release allows encode() and predict() to accept 1D numpy string arrays as inputs.
Install this version with
# Training + Inference
pip install sentence-transformers[train]==5.4.1
# Inference only, use one of:
pip install sentence-transformers==5.4.1
pip install sentence-transformers[onnx-gpu]==5.4.1
pip install sentence-transformers[onnx]==5.4.1
pip install sentence-transformers[openvino]==5.4.1
# Multimodal dependencies (optional):
pip install sentence-transformers[image]==5.4.1
pip install sentence-transformers[audio]==5.4.1
pip install sentence-transformers[video]==5.4.1
# Or combine as needed:
pip install sentence-transformers[train,onnx,image]==5.4.1Numpy string/object arrays as batches (#3720)
encode() and predict() now correctly recognize 1D numpy string/object arrays as batches rather than singular inputs. Previously, something like model.encode(df["text"].to_numpy()) was silently treated as a single input and produced incorrect output. 1D numpy arrays with dtype.kind in ("U", "O") are now unpacked like lists, and 2D+ arrays are treated as batches of pairs (for CrossEncoder).
import numpy as np
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# Previously treated as one input; now correctly encoded as 3 separate texts
embeddings = model.encode(np.array(["first", "second", "third"]))
print(embeddings.shape)
# (3, 384)For CrossEncoder, a 1D numpy string array is still treated as a single [query, document] pair to match the existing list behavior, while a 2D array of shape (N, 2) is a batch of N pairs.
Safer activation function loading in Dense (#3714)
The Dense module stores its activation function as a dotted import path in its saved config (e.g. "torch.nn.modules.activation.Tanh"), which was then resolved via import_from_string whenever the module was loaded. Because any importable Python callable could be referenced, a maliciously crafted config.json on the Hub could trigger arbitrary imports at model load time.
The loader now only resolves activation functions whose import path starts with torch.. Anything else is skipped with a warning and replaced by the default activation (Tanh). To load a model with a custom (non-torch) activation function, opt in explicitly with trust_remote_code=True:
from sentence_transformers import SentenceTransformer
# Torch-provided activations load as before
model = SentenceTransformer("some/model-with-torch-activation")
# Non-torch activations now require explicit opt-in
model = SentenceTransformer("some/model-with-custom-activation", trust_remote_code=True)This mirrors the opt-in trust model already used by transformers for custom code, and ensures untrusted model repositories cannot smuggle arbitrary imports through the Dense activation config.
What's Changed
- [
tests] Fix test_trainer_prompts for SE and ST after prompt handling moved into Transformer.preprocess by @tomaarsen in #3710 - [
chore] Increment dev version after v5.4 release by @tomaarsen in #3711 - [
docs] No revision needed anymore for nvidia nemotron by @tomaarsen in #3712 - [
chore] Replace evaluation_strategy with eval_strategy in a few more places by @tomaarsen in #3713 - [
security] Only load activation functions starting with 'torch' in the Dense module by @tomaarsen in #3714 - [
fix] Treat numpy string/object arrays as batches in encode/predict by @tomaarsen in #3720
Full Changelog: v5.4.0...v5.4.1