github huggingface/text-generation-inference v0.5.0

latest releases: v3.3.7, v3.3.6, v3.3.5...
2 years ago

Features

  • server: add flash-attention based version of Llama
  • server: add flash-attention based version of Santacoder
  • server: support OPT models
  • router: make router input validation optional
  • docker: improve layer caching

Fix

  • server: improve token streaming decoding
  • server: fix escape charcaters in stop sequences
  • router: fix NCCL desync issues
  • router: use buckets for metrics histograms

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.