T5
You can now use the encoder from T5 to learn text embeddings. You can use it like any other transformer model:
from sentence_transformers import SentenceTransformer, models
word_embedding_model = models.Transformer('t5-base', max_seq_length=256)
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
See T5-Benchmark results - the T5 encoder is not the best model for learning text embeddings models. It requires quite a lot of training data and training steps. Other models perform much better, at least in the given experiment with 560k training triplets.
New Models
The models from the papers Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models and Large Dual Encoders Are Generalizable Retrievers have been added:
- gtr-t5-base
- gtr-t5-large
- gtr-t5-xl
- gtr-t5-xxl
- sentence-t5-base
- sentence-t5-large
- sentence-t5-xl
- sentence-t5-xxl
For benchmark results, see https://seb.sbert.net
Private Models
Thanks to #1406 you can now load private models from the hub:
model = SentenceTransformer("your-username/your-model", use_auth_token=True)