Major changes
- Mixtral model support (officially from @mistralai)
- AMD GPU support (collaboration with @EmbeddedLLM)
What's Changed
- add custom server params by @esmeetu in #1868
- support ChatGLMForConditionalGeneration by @dancingpipi in #1932
- Save pytorch profiler output for latency benchmark by @Yard1 in #1871
- Fix typo in adding_model.rst by @petergtz in #1947
- Make InternLM follow
rope_scaling
inconfig.json
by @theFool32 in #1956 - Fix quickstart.rst example by @gottlike in #1964
- Adding number of nvcc_threads during build as envar by @AguirreNicolas in #1893
- fix typo in getenv call by @dskhudia in #1972
- [Continuation] Merge EmbeddedLLM/vllm-rocm into vLLM main by @tjtanaa in #1836
- Fix Baichuan2-7B-Chat by @firebook in #1987
- [Docker] Add cuda arch list as build option by @simon-mo in #1950
- Fix for KeyError on Loading LLaMA by @imgaojun in #1978
- [Minor] Fix code style for baichuan by @WoosukKwon in #2003
- Fix OpenAI server completion_tokens referenced before assignment by @js8544 in #1996
- [Minor] Add comment on skipping rope caches by @WoosukKwon in #2004
- Replace head_mapping params with num_kv_heads to attention kernel. by @wbn03 in #1997
- Fix completion API echo and logprob combo by @simon-mo in #1992
- Mixtral 8x7B support by @pierrestock in #2011
- Minor fixes for Mixtral by @WoosukKwon in #2015
- Change load format for Mixtral by @WoosukKwon in #2028
- Update run_on_sky.rst by @eltociear in #2025
- Update requirements.txt for mixtral by @0-hero in #2029
- Revert #2029 by @WoosukKwon in #2030
- [Minor] Fix latency benchmark script by @WoosukKwon in #2035
- [Minor] Fix type annotation in Mixtral by @WoosukKwon in #2036
- Update README.md to add megablocks requirement for mixtral by @0-hero in #2033
- [Minor] Fix import error msg for megablocks by @WoosukKwon in #2038
- Bump up to v0.2.4 by @WoosukKwon in #2034
New Contributors
- @dancingpipi made their first contribution in #1932
- @petergtz made their first contribution in #1947
- @theFool32 made their first contribution in #1956
- @gottlike made their first contribution in #1964
- @AguirreNicolas made their first contribution in #1893
- @dskhudia made their first contribution in #1972
- @tjtanaa made their first contribution in #1836
- @firebook made their first contribution in #1987
- @imgaojun made their first contribution in #1978
- @js8544 made their first contribution in #1996
- @wbn03 made their first contribution in #1997
- @pierrestock made their first contribution in #2011
- @0-hero made their first contribution in #2029
Full Changelog: v0.2.3...v0.2.4