github huggingface/text-generation-inference v2.4.1

latest releases: v3.3.7, v3.3.6, v3.3.5...
13 months ago

Notable changes

  • Choose input/total tokens automatically based on available VRAM
  • Support Qwen2 VL
  • Decrease latency of very large batches (> 128)

What's Changed

New Contributors

Full Changelog: v2.3.0...v2.4.1

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.