github huggingface/text-generation-inference v0.4.0

latest releases: v3.3.6, v3.3.5, v3.3.4...
2 years ago

Features

  • router: support best_of sampling
  • router: support left truncation
  • server: support typical sampling
  • launcher: allow local models
  • clients: add text-generation Python client
  • launcher: allow parsing num_shard from CUDA_VISIBLE_DEVICES

Fix

  • server: do not warp prefill logits
  • server: fix formatting issues in generate_stream tokens
  • server: fix galactica batch
  • server: fix index out of range issue with watermarking

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.