vllm-project/vllm v0.1.5 on GitHub

Major Changes

Align beam search with hf_model.generate.
Stablelize AsyncLLMEngine with a background engine loop.
Add support for CodeLLaMA.
Add many model correctness tests.
Many other correctness fixes.

What's Changed

Add support for CodeLlama by @Yard1 in #854
[Fix] Fix a condition for ignored sequences by @zhuohan123 in #867
use flash-attn via xformers by @tmm1 in #877
Enable request body OpenAPI spec for OpenAI endpoints by @Peilun-Li in #865
Accelerate LLaMA model loading by @JF-D in #234
Improve _prune_hidden_states micro-benchmark by @tmm1 in #707
fix: bug fix when penalties are negative by @pfldy2850 in #913
[Docs] Minor fixes in supported models by @WoosukKwon in #920
Fix README.md Link by @zhuohan123 in #927
Add tests for models by @WoosukKwon in #922
Avoid compiling kernels for double data type by @WoosukKwon in #933
[BugFix] Fix NaN errors in paged attention kernel by @WoosukKwon in #936
Refactor AsyncLLMEngine by @Yard1 in #880
Only emit warning about internal tokenizer if it isn't being used by @nelson-liu in #939
Align vLLM's beam search implementation with HF generate by @zhuohan123 in #857
Initialize AsyncLLMEngine bg loop correctly by @Yard1 in #943
FIx vLLM cannot launch by @HermitSun in #948
Clean up kernel unit tests by @WoosukKwon in #938
Use queue for finished requests by @Yard1 in #957
[BugFix] Implement RoPE for GPT-J by @WoosukKwon in #941
Set torch default dtype in a context manager by @Yard1 in #971
Bump up transformers version in requirements.txt by @WoosukKwon in #976
Make AsyncLLMEngine more robust & fix batched abort by @Yard1 in #969
Enable safetensors loading for all models by @zhuohan123 in #974
[FIX] Fix Alibi implementation in PagedAttention kernel by @zhuohan123 in #945
Bump up the version to v0.1.5 by @WoosukKwon in #944

New Contributors

@tmm1 made their first contribution in #877
@Peilun-Li made their first contribution in #865
@JF-D made their first contribution in #234
@pfldy2850 made their first contribution in #913
@nelson-liu made their first contribution in #939

Full Changelog: v0.1.4...v0.1.5