Major Changes
- Align beam search with
hf_model.generate
. - Stablelize AsyncLLMEngine with a background engine loop.
- Add support for CodeLLaMA.
- Add many model correctness tests.
- Many other correctness fixes.
What's Changed
- Add support for CodeLlama by @Yard1 in #854
- [Fix] Fix a condition for ignored sequences by @zhuohan123 in #867
- use flash-attn via xformers by @tmm1 in #877
- Enable request body OpenAPI spec for OpenAI endpoints by @Peilun-Li in #865
- Accelerate LLaMA model loading by @JF-D in #234
- Improve _prune_hidden_states micro-benchmark by @tmm1 in #707
- fix: bug fix when penalties are negative by @pfldy2850 in #913
- [Docs] Minor fixes in supported models by @WoosukKwon in #920
- Fix README.md Link by @zhuohan123 in #927
- Add tests for models by @WoosukKwon in #922
- Avoid compiling kernels for double data type by @WoosukKwon in #933
- [BugFix] Fix NaN errors in paged attention kernel by @WoosukKwon in #936
- Refactor AsyncLLMEngine by @Yard1 in #880
- Only emit warning about internal tokenizer if it isn't being used by @nelson-liu in #939
- Align vLLM's beam search implementation with HF generate by @zhuohan123 in #857
- Initialize AsyncLLMEngine bg loop correctly by @Yard1 in #943
- FIx vLLM cannot launch by @HermitSun in #948
- Clean up kernel unit tests by @WoosukKwon in #938
- Use queue for finished requests by @Yard1 in #957
- [BugFix] Implement RoPE for GPT-J by @WoosukKwon in #941
- Set torch default dtype in a context manager by @Yard1 in #971
- Bump up transformers version in requirements.txt by @WoosukKwon in #976
- Make
AsyncLLMEngine
more robust & fix batched abort by @Yard1 in #969 - Enable safetensors loading for all models by @zhuohan123 in #974
- [FIX] Fix Alibi implementation in PagedAttention kernel by @zhuohan123 in #945
- Bump up the version to v0.1.5 by @WoosukKwon in #944
New Contributors
- @tmm1 made their first contribution in #877
- @Peilun-Li made their first contribution in #865
- @JF-D made their first contribution in #234
- @pfldy2850 made their first contribution in #913
- @nelson-liu made their first contribution in #939
Full Changelog: v0.1.4...v0.1.5