github huggingface/text-generation-inference v0.6.0

latest releases: v3.3.7, v3.3.6, v3.3.5...
2 years ago

Features

  • server: flash attention past key values optimization (contributed by @njhill)
  • router: remove requests when client closes the connection (co-authored by @njhill)
  • server: support quantization for flash models
  • router: add info route
  • server: optimize token decode
  • server: support flash sharded santacoder
  • security: image signing with cosign
  • security: image analysis with trivy
  • docker: improve image size

Fix

  • server: check cuda capability before importing flash attention
  • server: fix hf_transfer issue with private repositories
  • router: add auth token for private tokenizers

Misc

  • rust: update to 1.69

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.