Major Changes
- PagedAttention V2 kernel: Up to 20% end-to-end latency reduction
- Support log probabilities for prompt tokens
- AWQ support for Mistral 7B
What's Changed
- fixing typo in
tiiuae/falcon-rw-7b
model name by @0ssamaak0 in #1226 - Added
dtype
arg to benchmarks by @kg6-sleipnir in #1228 - fix vulnerable memory modification to gpu shared memory by @soundOfDestiny in #1241
- support sharding llama2-70b on more than 8 GPUs by @zhuohan123 in #1209
- [Minor] Fix type annotations by @WoosukKwon in #1238
- TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic by @zhuohan123 in #1181
- add support for tokenizer revision by @cassanof in #1163
- Use monotonic time where appropriate by @Yard1 in #1249
- API server support ipv4 / ipv6 dualstack by @yunfeng-scale in #1288
- Move bfloat16 check to worker by @Yard1 in #1259
- [FIX] Explain why the finished_reason of ignored sequences are length by @zhuohan123 in #1289
- Update README.md by @zhuohan123 in #1292
- [Minor] Fix comment in mistral.py by @zhuohan123 in #1303
- lock torch version to 2.0.1 when build for #1283 by @yanxiyue in #1290
- minor update by @WrRan in #1311
- change the timing of sorting logits by @yhlskt23 in #1309
- workaround of AWQ for Turing GPUs by @twaka in #1252
- Fix overflow in awq kernel by @chu-tianxiang in #1295
- Update model_loader.py by @AmaleshV in #1278
- Add blacklist for model checkpoint by @WoosukKwon in #1325
- Update README.md Aquila2. by @ftgreat in #1331
- Improve detokenization performance by @Yard1 in #1338
- Bump up transformers version & Remove MistralConfig by @WoosukKwon in #1254
- Fix the issue for AquilaChat2-* models by @lu-wang-dl in #1339
- Fix error message on
TORCH_CUDA_ARCH_LIST
by @WoosukKwon in #1239 - Minor fix on AWQ kernel launch by @WoosukKwon in #1356
- Implement PagedAttention V2 by @WoosukKwon in #1348
- Implement prompt logprobs & Batched topk for computing logprobs by @zhuohan123 in #1328
- Fix PyTorch version to 2.0.1 in workflow by @WoosukKwon in #1377
- Fix PyTorch index URL in workflow by @WoosukKwon in #1378
- Fix sampler test by @WoosukKwon in #1379
- Bump up the version to v0.2.1 by @zhuohan123 in #1355
New Contributors
- @0ssamaak0 made their first contribution in #1226
- @kg6-sleipnir made their first contribution in #1228
- @soundOfDestiny made their first contribution in #1241
- @cassanof made their first contribution in #1163
- @yunfeng-scale made their first contribution in #1288
- @yanxiyue made their first contribution in #1290
- @yhlskt23 made their first contribution in #1309
- @chu-tianxiang made their first contribution in #1295
- @AmaleshV made their first contribution in #1278
- @lu-wang-dl made their first contribution in #1339
Full Changelog: v0.2.0...v0.2.1