github huggingface/text-generation-inference v2.1.0

latest releases: v3.3.7, v3.3.6, v3.3.5...
17 months ago

Notable changes

  • New models : gemma2

  • Multi lora adapters. You can now run multiple loras on the same TGI deployment #2010

  • Faster GPTQ inference and Marlin support (up to 2x speedup).

  • Reworked the entire scheduling logic (better block allocations, and allowing further speedups in new releases)

  • Lots of Rocm support and bugfixes,

  • Lots of new contributors ! Thanks a lot for these contributions

What's Changed

New Contributors

Full Changelog: v2.0.3...v2.1.0

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.