UKPLab/sentence-transformers v0.3.4 on GitHub

The documentation is substantially improved and can be found at: www.SBERT.net - Feedback welcome
The dataset to hold training InputExamples (dataset.SentencesDataset) now uses lazy tokenization, i.e., examples are tokenized once they are needed for a batch. If you set num_workers to a positive integer in your DataLoader, tokenization will happen in a background thread. This substantially increases the start-up time for training.
model.encode() uses also a PyTorch DataSet + DataLoader. If you set num_workers to a positive integer, tokenization will happen in the background leading to faster encoding speed for large corpora.
Added functions and an example for mutli-GPU encoding - This method can be used to encode a corpus with multiple GPUs in parallel. No multi-GPU support for training yet.
Removed parallel_tokenization parameters from encode & SentencesDatasets - No longer needed with lazy tokenization and DataLoader worker threads.
Smaller bugfixes

Breaking changes:

Renamed evaluation.BinaryEmbeddingSimilarityEvaluator to evaluation.BinaryClassificationEvaluator

UKPLab/sentence-transformers v0.3.4 v0.3.4 - Improved Documentation, Improved Tokenization Speed, Mutli-GPU encoding on GitHub