github huggingface/text-generation-inference v3.0.2

latest releases: v3.3.7, v3.3.6, v3.3.5...
11 months ago

Tl;dr

New transformers backend supporting flashattention at roughly same performance as pure TGI for all non officially supported models directly in TGI. Congrats @Cyrilvallez

New models unlocked: Cohere2, olmo, olmo2, helium.

What's Changed

New Contributors

Full Changelog: v3.0.1...v3.0.2

Don't miss a new text-generation-inference release

NewReleases is sending notifications on new releases.