github huggingface/text-generation-inference v2.2.0

latest releases: v3.3.6, v3.3.5, v3.3.4...
16 months ago

Notable changes

  • Llama 3.1 support (including 405B, FP8 support in a lot of mixed configurations, FP8, AWQ, GPTQ, FP8+FP16).
  • Gemma2 softcap support
  • Deepseek v2 support.
  • Lots of internal reworks/cleanup (allowing for cool features)
  • Lots of AWQ/GPTQ work with marlin kernels (everything should be faster by default)
  • Flash decoding support (FLASH_DECODING=1 environment variables which will probably enable some nice improvements in the future)

What's Changed

New Contributors

Full Changelog: v2.1.1...v2.2.0

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.