Major Changes
This version adds support for the OLMo and Gemma Model, as well as seed
parameter.
What's Changed
- Defensively copy
sampling_params
by @njhill in #2881 - multi-LoRA as extra models in OpenAI server by @jvmncs in #2775
- Add code-revision config argument for Hugging Face Hub by @mbm-ai in #2892
- [Minor] Small fix to make distributed init logic in worker looks cleaner by @zhuohan123 in #2905
- [Test] Add basic correctness test by @zhuohan123 in #2908
- Support OLMo models. by @Isotr0py in #2832
- Add warning to prevent changes to benchmark api server by @simon-mo in #2858
- Fix
vllm:prompt_tokens_total
metric calculation by @ronensc in #2869 - [ROCm] include gfx908 as supported by @jamestwhedbee in #2792
- [FIX] Fix beam search test by @zhuohan123 in #2930
- Make vLLM logging formatting optional by @Yard1 in #2877
- Add metrics to RequestOutput by @Yard1 in #2876
- Add Gemma model by @xiangxu-google in #2964
- Upgrade transformers to v4.38.0 by @WoosukKwon in #2965
- [FIX] Add Gemma model to the doc by @zhuohan123 in #2966
- [ROCm] Upgrade transformers to v4.38.0 by @WoosukKwon in #2967
- Support per-request seed by @njhill in #2514
- Bump up version to v0.3.2 by @zhuohan123 in #2968
New Contributors
- @jvmncs made their first contribution in #2775
- @mbm-ai made their first contribution in #2892
- @Isotr0py made their first contribution in #2832
- @jamestwhedbee made their first contribution in #2792
Full Changelog: v0.3.1...v0.3.2