vllm 0.2.1 on Python PyPI

Major Changes

PagedAttention V2 kernel: Up to 20% end-to-end latency reduction
Support log probabilities for prompt tokens
AWQ support for Mistral 7B

What's Changed

fixing typo in tiiuae/falcon-rw-7b model name by @0ssamaak0 in #1226
Added dtype arg to benchmarks by @kg6-sleipnir in #1228
fix vulnerable memory modification to gpu shared memory by @soundOfDestiny in #1241
support sharding llama2-70b on more than 8 GPUs by @zhuohan123 in #1209
[Minor] Fix type annotations by @WoosukKwon in #1238
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic by @zhuohan123 in #1181
add support for tokenizer revision by @cassanof in #1163
Use monotonic time where appropriate by @Yard1 in #1249
API server support ipv4 / ipv6 dualstack by @yunfeng-scale in #1288
Move bfloat16 check to worker by @Yard1 in #1259
[FIX] Explain why the finished_reason of ignored sequences are length by @zhuohan123 in #1289
Update README.md by @zhuohan123 in #1292
[Minor] Fix comment in mistral.py by @zhuohan123 in #1303
lock torch version to 2.0.1 when build for #1283 by @yanxiyue in #1290
minor update by @WrRan in #1311
change the timing of sorting logits by @yhlskt23 in #1309
workaround of AWQ for Turing GPUs by @twaka in #1252
Fix overflow in awq kernel by @chu-tianxiang in #1295
Update model_loader.py by @AmaleshV in #1278
Add blacklist for model checkpoint by @WoosukKwon in #1325
Update README.md Aquila2. by @ftgreat in #1331
Improve detokenization performance by @Yard1 in #1338
Bump up transformers version & Remove MistralConfig by @WoosukKwon in #1254
Fix the issue for AquilaChat2-* models by @lu-wang-dl in #1339
Fix error message on TORCH_CUDA_ARCH_LIST by @WoosukKwon in #1239
Minor fix on AWQ kernel launch by @WoosukKwon in #1356
Implement PagedAttention V2 by @WoosukKwon in #1348
Implement prompt logprobs & Batched topk for computing logprobs by @zhuohan123 in #1328
Fix PyTorch version to 2.0.1 in workflow by @WoosukKwon in #1377
Fix PyTorch index URL in workflow by @WoosukKwon in #1378
Fix sampler test by @WoosukKwon in #1379
Bump up the version to v0.2.1 by @zhuohan123 in #1355

New Contributors

@0ssamaak0 made their first contribution in #1226
@kg6-sleipnir made their first contribution in #1228
@soundOfDestiny made their first contribution in #1241
@cassanof made their first contribution in #1163
@yunfeng-scale made their first contribution in #1288
@yanxiyue made their first contribution in #1290
@yhlskt23 made their first contribution in #1309
@chu-tianxiang made their first contribution in #1295
@AmaleshV made their first contribution in #1278
@lu-wang-dl made their first contribution in #1339

Full Changelog: v0.2.0...v0.2.1

vllm 0.2.1 v0.2.1 on Python PyPI

Major Changes

What's Changed

New Contributors

vllm 0.2.1
v0.2.1

on Python PyPI