github UKPLab/sentence-transformers v0.3.4
v0.3.4 - Improved Documentation, Improved Tokenization Speed, Mutli-GPU encoding

latest releases: v3.3.0, v3.2.1, v3.2.0...
4 years ago
  • The documentation is substantially improved and can be found at: www.SBERT.net - Feedback welcome
  • The dataset to hold training InputExamples (dataset.SentencesDataset) now uses lazy tokenization, i.e., examples are tokenized once they are needed for a batch. If you set num_workers to a positive integer in your DataLoader, tokenization will happen in a background thread. This substantially increases the start-up time for training.
  • model.encode() uses also a PyTorch DataSet + DataLoader. If you set num_workers to a positive integer, tokenization will happen in the background leading to faster encoding speed for large corpora.
  • Added functions and an example for mutli-GPU encoding - This method can be used to encode a corpus with multiple GPUs in parallel. No multi-GPU support for training yet.
  • Removed parallel_tokenization parameters from encode & SentencesDatasets - No longer needed with lazy tokenization and DataLoader worker threads.
  • Smaller bugfixes

Breaking changes:

  • Renamed evaluation.BinaryEmbeddingSimilarityEvaluator to evaluation.BinaryClassificationEvaluator

Don't miss a new sentence-transformers release

NewReleases is sending notifications on new releases.