vllm-project/vllm v0.1.4 on GitHub

Major changes

From now on, vLLM is published with pre-built CUDA binaries. Users don't have to compile the vLLM's CUDA kernels on their machine.
New models: InternLM, Qwen, Aquila.
Optimizing CUDA kernels for paged attention and GELU.
Many bug fixes.

What's Changed

Fix gibberish outputs of GPT-BigCode-based models by @HermitSun in #676
[OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel by @naed90 in #420
add QWen-7b support by @Sanster in #685
add internlm model by @gqjia in #528
Check the max prompt length for the OpenAI completions API by @nicobasile in #472
[Fix] unwantted bias in InternLM Model by @wangruohui in #740
Supports tokens and arrays of tokens as inputs to the OpenAI completion API by @wanmok in #715
Fix baichuan doc style by @UranusSeven in #748
Fix typo in tokenizer.py by @eltociear in #750
Align with huggingface Top K sampling by @Abraham-Xu in #753
explicitly del state by @cauyxy in #784
Fix typo in sampling_params.py by @wangcx18 in #788
[Feature | CI] Added a github action to build wheels by @Danielkinz in #746
set default coompute capability according to cuda version by @zxdvd in #773
Fix mqa is false case in gpt_bigcode by @zhaoyang-star in #806
Add support for aquila by @shunxing1234 in #663
Update Supported Model List by @zhuohan123 in #825
Fix 'GPTBigCodeForCausalLM' object has no attribute 'tensor_model_parallel_world_size' by @HermitSun in #827
Add compute capability 8.9 to default targets by @WoosukKwon in #829
Implement approximate GELU kernels by @WoosukKwon in #828
Fix typo of Aquila in README.md by @ftgreat in #836
Fix for breaking changes in xformers 0.0.21 by @WoosukKwon in #834
Clean up code by @wenjun93 in #844
Set replacement=True in torch.multinomial by @WoosukKwon in #858
Bump up the version to v0.1.4 by @WoosukKwon in #846

New Contributors

@naed90 made their first contribution in #420
@gqjia made their first contribution in #528
@nicobasile made their first contribution in #472
@wanmok made their first contribution in #715
@UranusSeven made their first contribution in #748
@eltociear made their first contribution in #750
@Abraham-Xu made their first contribution in #753
@cauyxy made their first contribution in #784
@wangcx18 made their first contribution in #788
@Danielkinz made their first contribution in #746
@zhaoyang-star made their first contribution in #806
@shunxing1234 made their first contribution in #663
@ftgreat made their first contribution in #836
@wenjun93 made their first contribution in #844

Full Changelog: v0.1.3...v0.1.4

vllm-project/vllm v0.1.4 vLLM v0.1.4 on GitHub

Major changes

What's Changed

New Contributors

vllm-project/vllm v0.1.4
vLLM v0.1.4

on GitHub