vllm-project/vllm-omni v0.15.0rc1 on GitHub

This pre-release is a alignment to the upstream vLLM v0.15.0.

Highlights

Rebase to Upstream vLLM v0.15.0: vLLM-Omni is now fully aligned with the latest vLLM v0.15.0 core, bringing in all the latest upstream features, bug fixes, and performance improvements (#1159).
Tensor Parallelism for LongCat-Image: We have added Tensor Parallelism (TP) support for LongCat-Image and LongCat-Image-Edit models, significantly improving the inference speed and scalability of these vision-language models (#926).
TeaCache Optimization: Introduced Coefficient Estimation for TeaCache, further refining the efficiency of our caching mechanisms for optimized generation (#940).
Alignment & Stability:
- Enhanced error handling logic to maintain consistency with upstream vLLM v0.14.0/v0.15.0 standards (#1122).
- Integrated "Bagel" E2E Smoke Tests and refactored sequence parallel tests to ensure robust CI/CD and accurate performance benchmarking (#1074, #1165).
Update paper link: A intial paper to arxiv to give introductions to our design and some performance test results (#1169).

Full Changelog: v0.14.0...v0.15.0rc1