Major changes
- From now on, vLLM is published with pre-built CUDA binaries. Users don't have to compile the vLLM's CUDA kernels on their machine.
- New models: InternLM, Qwen, Aquila.
- Optimizing CUDA kernels for paged attention and GELU.
- Many bug fixes.
What's Changed
- Fix gibberish outputs of GPT-BigCode-based models by @HermitSun in #676
- [OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel by @naed90 in #420
- add QWen-7b support by @Sanster in #685
- add internlm model by @gqjia in #528
- Check the max prompt length for the OpenAI completions API by @nicobasile in #472
- [Fix] unwantted bias in InternLM Model by @wangruohui in #740
- Supports tokens and arrays of tokens as inputs to the OpenAI completion API by @wanmok in #715
- Fix baichuan doc style by @UranusSeven in #748
- Fix typo in tokenizer.py by @eltociear in #750
- Align with huggingface Top K sampling by @Abraham-Xu in #753
- explicitly del state by @cauyxy in #784
- Fix typo in sampling_params.py by @wangcx18 in #788
- [Feature | CI] Added a github action to build wheels by @Danielkinz in #746
- set default coompute capability according to cuda version by @zxdvd in #773
- Fix mqa is false case in gpt_bigcode by @zhaoyang-star in #806
- Add support for aquila by @shunxing1234 in #663
- Update Supported Model List by @zhuohan123 in #825
- Fix 'GPTBigCodeForCausalLM' object has no attribute 'tensor_model_parallel_world_size' by @HermitSun in #827
- Add compute capability 8.9 to default targets by @WoosukKwon in #829
- Implement approximate GELU kernels by @WoosukKwon in #828
- Fix typo of Aquila in README.md by @ftgreat in #836
- Fix for breaking changes in xformers 0.0.21 by @WoosukKwon in #834
- Clean up code by @wenjun93 in #844
- Set replacement=True in torch.multinomial by @WoosukKwon in #858
- Bump up the version to v0.1.4 by @WoosukKwon in #846
New Contributors
- @naed90 made their first contribution in #420
- @gqjia made their first contribution in #528
- @nicobasile made their first contribution in #472
- @wanmok made their first contribution in #715
- @UranusSeven made their first contribution in #748
- @eltociear made their first contribution in #750
- @Abraham-Xu made their first contribution in #753
- @cauyxy made their first contribution in #784
- @wangcx18 made their first contribution in #788
- @Danielkinz made their first contribution in #746
- @zhaoyang-star made their first contribution in #806
- @shunxing1234 made their first contribution in #663
- @ftgreat made their first contribution in #836
- @wenjun93 made their first contribution in #844
Full Changelog: v0.1.3...v0.1.4