vllm-project/vllm v0.2.5 on GitHub

Major changes

Update Dockerfile to support Mixtral by @simon-mo in #2027
Remove python 3.10 requirement by @WoosukKwon in #2040
[CI/CD] Upgrade PyTorch version to v2.1.1 by @WoosukKwon in #2045
Upgrade transformers version to 4.36.0 by @WoosukKwon in #2046
Remove einops from dependencies by @WoosukKwon in #2049
gqa added to mpt attn by @megha95 in #1938
Update Dockerfile to build Megablocks by @simon-mo in #2042
Fix peak memory profiling by @WoosukKwon in #2031
Implement lazy model loader by @WoosukKwon in #2044
[ROCm] Upgrade xformers version dependency for ROCm; update documentations by @tjtanaa in #2079
Update installation instruction for CUDA 11.8 by @WoosukKwon in #2086
[Docs] Add notes on ROCm-supported models by @WoosukKwon in #2087
[BugFix] Fix input positions for long context with sliding window by @WoosukKwon in #2088
Mixtral expert parallelism by @Yard1 in #2090
Bump up to v0.2.5 by @WoosukKwon in #2095

Full Changelog: v0.2.4...v0.2.5