github huggingface/text-generation-inference v0.9.1

latest releases: v3.3.6, v3.3.5, v3.3.4...
2 years ago

Highlights

  • server: Non flash MPT
  • server: decrease memory fragmentation

Features

  • server: use latest flash attention
  • router: add argument for hostname in router
  • docs: Adding some help for the options in text-generation-benchmark

Fix

  • makefile: Update server/Makefile to include Makefile-vllm
  • server: Handle loading from local files for MPT
  • server: avoid errors for very small top_p values

Full Changelog: v0.9.0...v0.9.1

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.