Highlights
This release features 327 commits from 99 contributors, including 36 new contributors.
vLLM-Omni v0.20.0rc1 is a release candidate aligned with upstream vLLM v0.20.0. It expands the model catalog across speech, omni, image, and video workloads; improves production readiness for TTS, diffusion, BAGEL, and multi-stage serving; and broadens hardware coverage across CUDA, ROCm, MUSA, NPU, XPU, and AMD validation paths. This release candidate is intended to validate the upstream rebase, the refreshed serving/runtime behavior, and the expanded model and platform matrix before the final v0.20.0 release.
Key Improvements
- Rebased to upstream vLLM v0.20.0, with entrypoint cleanup, stage CLI refactoring, sleep-mode support, coordinator reliability fixes, and removed vLLM entrypoint hijacking for the 0.20.0 integration path. (#3232, #3082, #2020, #2022, #1899)
- Expanded model support across omni, TTS, image, and video workloads, including MagiHuman, Dynin-omni, InternVLA-A1, Ming-flash-omni-2.0, XiaomiMiMo/MiMo-V2.5-ASR, MOSS-TTS-Nano, VoxCPM2 native AR TTS, LTX-2.3, and FastGen Wan 2.1 pipelines. (#2301, #1759, #2737, #2890, #3089, #2753, #2658, #2893, #2749)
- Improved TTS and audio production behavior, with lower VRAM usage, CUDA graph reuse, voice-cloning fixes, deterministic Fish Speech generation, a unified TTS pipeline/deploy schema, and a new universal TTS benchmark. (#2480, #2429, #2430, #2520, #2609, #2624, #2676, #2690, #2835, #2958, #3253)
- Strengthened diffusion, image, and video generation, adding Z-Image image-to-image, FLUX.1/FLUX.2 TeaCache and CFG-parallel paths, Wan/LTX model support, VAE tiling, diffusion profiler/progress tooling, MP4 latency optimization, and HSDP coverage for LTX-2 and Stable-Audio-Open. (#1580, #1871, #2010, #2134, #2160, #2368, #2489, #2735, #2774, #2899, #2982)
- Made BAGEL more capable in serving and RL workflows, with LoRA support, think mode, fused projections, RDMA/connector work, TP/CFG parallel flow, layerwise offload, diffusion metrics, and rollout fixes. (#2490, #2494, #2503, #2546, #2650, #2705, #2731, #2734, #2932, #3258)
- Broadened quantization, memory, and hardware coverage, including OmniGen2 FP8, Qwen Omni W4A16, HunyuanImage3 NPU quantization, safer pre-quantized checkpoint handling, MUSA flash attention and
torch.acceleratorsupport, NPU graph/fused-op improvements, ROCm/AMD CI fixes, and XPU torch inductor support. (#2441, #2670, #2702, #2795, #2979, #2451, #2695, #2766, #3067, #3101, #3113, #3225)
Core Architecture & Runtime
- Rebased vLLM-Omni to upstream vLLM v0.20.0 and removed the old vLLM entrypoint hijack used before the 0.20.0 integration path. (#3232, #3082)
- Refreshed the runtime and stage lifecycle with a stage CLI refactor, omni sleep mode and acknowledgement protocol, coordinator reconnect/race/heartbeat fixes, stage launch-lock handling, and restored user-configurable stage initialization timeout behavior. (#2020, #2022, #1899, #2717, #2519)
- Improved request/runtime configuration behavior by preserving media access arguments, removing invalid LLM-only diffusion stage args, centralizing stage sampling parameter resolution, and guarding silent config failures when
trust_remote_codeis unset. (#2956, #2622, #3153, #3241) - Added lower-level runtime support such as inline-client health checks, optional MROPE kwargs for HunyuanImage3, GPU-buffer accessor fixes, and clearer runtime failure reporting. (#3052, #2654, #2068, #2426)
Model Support
- Added or expanded omni and speech model coverage for MagiHuman, Dynin-omni, InternVLA-A1, Ming-flash-omni-2.0, XiaomiMiMo/MiMo-V2.5-ASR, MOSS-TTS-Nano, VoxCPM2 native AR TTS, and OmniVoice voice cloning. (#2301, #2542, #2554, #1759, #2737, #2890, #3089, #2753, #2658, #2676)
- Expanded video and diffusion model support with Wan2.2-I2V-A14B, FastGen DMD2-distilled Wan 2.1 text-to-video and image-to-video pipelines, LTX-2.3, LTX-2 HSDP, and Stable-Audio-Open HSDP. (#2134, #2749, #2893, #2899, #2982)
- Improved GLM-Image, HunyuanImage3, FLUX, and Z-Image behavior with config migration, benchmark/bug fixes, multi-stage serving fixes, system prompt alignment, text encoder fixes, image-to-image support, and quantization support for GLM-Image. (#2977, #3024, #3084, #2270, #2760, #1580, #2292)
Audio, Speech & Omni Production Optimization
- Improved Qwen3-TTS serving quality and stability by fixing streaming chunk-boundary artifacts, aligning code predictor dtypes, handling missing
ref_text, correcting Code2Wav length and eager-mode behavior, supporting local-path reference audio, and using float32 code prediction on fp16-only GPUs. (#2480, #2470, #2203, #2508, #2868, #2984, #3253) - Reduced TTS and audio memory/latency overhead by freeing unused Qwen3-TTS and Fish Speech decoder/codec components, enabling Fish Speech CUDA graph capture and reference-audio caching, sharing CUDA graph memory pools, and optimizing VoxCPM2 streaming VAE/compile and manual CUDA graph paths. (#2429, #2430, #2520, #2609, #2386, #2758, #2803)
- Expanded voice and speech API behavior with
speakeras avoicealias, deterministic Fish Speech seed support, raw-audio VoxCPM2 voice cloning, OmniVoice voice cloning, and speaker validation/case-insensitive lookup. (#2424, #2624, #2720, #2676, #2407) - Migrated VoxCPM2, CosyVoice3, MiMo Audio, Voxtral TTS, and Fish Speech S2 Pro to the Pipeline + Deploy schema, added the universal TTS benchmark, and offloaded blocking TTS/speech work to avoid event-loop stalls. (#2958, #2835, #2511)
Diffusion, Image & Video Generation
- Expanded generation capabilities with Z-Image image-to-image, FLUX.1/FLUX.2 TeaCache, FLUX.2-dev CFG parallel, generalized diffusers adapter backend support, and profiler/progress tooling for diffusion pipelines. (#1580, #2774, #1871, #2010, #2724, #2489)
- Improved video generation with Wan2.2 BF16 VAE conversion, fused RMSNorm/AdaLayerNorm and NPU fused RMSNorm paths, LightX2V offline conversion, reduced duplicate preprocessing, MP4 encoding latency optimization, and FastGen Wan 2.1 pipeline support. (#2391, #2583, #2585, #3067, #2134, #2963, #2735, #2749)
- Strengthened distributed and memory-efficient diffusion with VAE tiling parallel encode, unified CFG parallel support for LTX2, 3/4-branch CFG dispatch, per-pipeline offloadable-module declarations, layerwise offload for additional diffusion models and BAGEL, HSDP support, and inline execution for single-stage diffusion. (#2368, #2160, #2423, #2427, #2339, #2734, #2899, #2982, #2736)
- Improved online image/video serving correctness with request cancellation for
/v1/images/generations, max-generated-image-size enforcement, default video sampling fixes, ComfyUI image-to-image DALL-E endpoint fixes, media/default sampling preservation, and pure-diffusion offline example fixes. (#2621, #2599, #3049, #2980, #2780, #3181)
Quantization & Memory Efficiency
- Added or improved quantization coverage for OmniGen2 FP8, Qwen Omni W4A16 via AutoRound, HunyuanImage3 offline quantization on the NPU diffusion path, and GLM-Image quantization, alongside updated quantization documentation. (#2441, #2670, #2979, #2292, #3200)
- Fixed pre-quantized checkpoint behavior by avoiding FP8 quant configs on vision/audio encoders and repairing broken FP8 quantization on Z-Image-Turbo, Qwen-Image, and FLUX.1-dev. (#2702, #2795)
- Improved memory efficiency across TTS and diffusion by combining CUDA graph reuse, codec/decoder cleanup, multi-block and layerwise CPU offloading, TeaCache/offload compatibility fixes, and pipeline-declared offloadable modules. (#2386, #2429, #2430, #1486, #2339, #2689, #2427)
RL, Serving & Integrations
- Improved BAGEL serving and RL flows with LoRA adapter injection, end-to-end LoRA support, text2text/img2text think mode, single-stage think mode, fused
gate_proj/up_proj, trajectory recording, RDMA flow updates, TP/CFG transfer-engine support, and rollout trajectory fixes. (#2490, #2494, #2503, #2650, #2546, #2483, #2000, #2705, #2731, #3258) - Added serving controls and API reliability improvements including least-queue-length and round-robin load balancers, OpenAI-compatible request cancellation for image generation, streaming delta messages, graceful multi-stage shutdown, guarded app-state access during shutdown, and response body fixes. (#2448, #2621, #2911, #3001, #2587, #3094)
- Improved diffusion and multimodal serving observability with diffusion metrics surfaced in chat completions, corrected metric keys, profiler output fixes, Nsight Systems support for serving, PyTorch profiler ops/memory recording, and multimodal benchmark token accounting fixes. (#2932, #2692, #2647, #1098, #2472, #2549)
Platforms, Distributed Execution & Hardware Coverage
- Expanded MUSA support with flash attention through MATE,
torch.acceleratorsupport, torchada updates, device capability/version APIs, and matching behavior with CUDA/ROCm flash attention paths. (#2451, #2766, #3101, #3132, #3179) - Improved NPU coverage with code predictor graph support, MindIE SD fused RoPE/cache paths, Wan2.2 fused ops, VAE parallel gather performance fixes, HunyuanImage3 quantization, and Ascend NPU documentation for Wan2.2 image-to-video. (#2695, #2571, #2583, #2585, #3067, #2969, #2979, #2919)
- Strengthened ROCm/AMD and XPU readiness through ROCm CI signal restoration and environment fixes, AMD simple-unit-test fixes, XPU torch inductor support, and platform capability cleanup such as
supports_float64()and flash-attention package detection. (#2340, #2708, #3225, #3113, #2488, #3068) - Improved distributed execution paths with Bagel TP/CFG transfer-engine support, diffusion TP-size propagation, non-contiguous gather fixes, and CFG companion/orchestrator cleanup. (#2731, #2867, #2367, #2623)
CI, Benchmarks & Documentation
- Added a CUDA Dockerfile for NVIDIA GPU users, doc-only CI change detection, a Buildkite skip-CI upload pipeline, and reorganized Buildkite nightly/ready/merge coverage for Omni and Diffusion models. (#1439, #1284, #2582, #2620, #2945)
- Expanded validation with stability and reliability tests for Wan2.2, Qwen3-Omni, Qwen3-TTS, Qwen-Image, Stable Audio TeaCache, Qwen image edit performance, L5 reliability, and selected previously skipped expansion tests. (#2377, #2216, #2817, #2972, #3211)
- Refreshed documentation for MUSA installation, multi-thread weight loading, expert parallelism, LTX-2 online serving, CLI usage, diffusion attention backends, profiling, quantization, and add-model skills for diffusion and TTS. (#2359, #2445, #2471, #1971, #2978, #3011, #3196, #3200, #2806)
- Replaced model-specific TTS benchmark folders with a more general TTS benchmark flow covering Qwen3-TTS and VoxCPM2 voice-clone/default/design tasks. (#2835)
Note
v0.20.0rc1is a release candidate. Use it to validate the upstream vLLM 0.20.0 rebase, the refreshed runtime and stage configuration behavior, and the expanded model/platform matrix before the final release.The remaining issues include #3268, #3266, #3264, #3257, #3256, #3255, and #2354.- The generated release appendix contains several reverted changes. This editorial note intentionally does not claim the reverted Qwen3-Omni performance optimization, VoxCPM2 instructions/cfg_value change, Z-image text encoder FP8 online quantization, or deploy override field refactor as shipped features. (#3202, #3204, #3272, #3287)
- Some low-signal CI, documentation, typo, and release-script maintenance changes were merged into broader themes instead of being listed one-by-one.
What's Changed
Keep the existing GitHub-generated What's Changed appendix below this editorial section when updating the release body, or regenerate it from:
New Contributors
Keep the existing GitHub-generated New Contributors appendix below this editorial section. The current generated appendix lists 36 first-time contributors for v0.20.0rc1.
What's Changed
- [Model Support]: Magihuman support by @princepride in #2301
- [Docs] Update WeChat QR code for community support by @david6666666 in #2481
- [CI] Fix missing queue for Voxtral-TTS E2E test step by @linyueqian in #2484
- [CosyVoice3] Fix vLLM 0.19.0 compatibility issues by @linyueqian in #2486
- [Model][Core] Enable async_chunk streaming pipeline for CosyVoice3 by @indevn in #1703
- [Chore] Fix Bagel model import compatibility by @yuanheng-zhao in #2491
- ci: remove CosyVoice3 post-merge test by @linyueqian in #2492
- [Feat] add diffusion pipeline profiler and progress bar support to FluxKontextPipeline et.al by @RuixiangMa in #2489
- [Bugfix] Include uv.lock in .gitignore by @timzsu in #2493
- [Bugfix] Assign original prompt back to RequestOutput by @yuanheng-zhao in #2498
- [CI/Build] Add Dockerfile.cuda for NVIDIA GPU users [Skip-CI] by @loveysuby in #1439
- [Fix] [Qwen3-TTS] Qwen3-TTS streaming chunk-boundary artifacts by @Sy0307 in #2480
- [Perf][Qwen3-TTS] Free unused decoder in Talker SpeechTokenizer to VRAM by @Sy0307 in #2429
- [Perf][Fish Speech] Free unused DAC codec components to save VRAM by @Sy0307 in #2430
- fix(qwen3_tts): align code predictor buffer dtype with model parameters by @willamhou in #2470
- [Feat] support for multi-block layerwise offloading, fix top-level parameters/buffers staying on CPU by @RuixiangMa in #1486
- [Feature] Enable LoRA adapter injection for BAGEL by @timzsu in #2490
- [Feature] Support vae tiling parallel encode by @gcanlin in #2368
- [Bugfix] Fix load_weights fallback for non-fused stacked_params_mapping entries by @timzsu in #2523
- [BugFix] Add bagel text2text/img2text think mode support by @princepride in #2503
- [BugFix] Continue decode if don't need transfer kv cache between two … by @princepride in #2502
- [CI] Add doc-only change detection to skip Buildkite CI. by @congw729 in #1284
- [Test] Test whether CI can be correctly skipped when the committed files only contain documentation. by @yenuo26 in #2534
- Add supports_float64() to OmniPlatform and clean up MPS by @yeahdongcn in #2488
- [Bugfix] Fix DataType Handling in Default Diffusion Config by @alex-jw-brooks in #2530
- [Docs] Add installation guide for Moore Threads (MUSA) GPUs by @yeahdongcn in #2359
- [bugfix]bugfix dreamid by @erfgss in #2125
- [RFC] Offload blocking TTS/speech ops to thread pool to unblock event loop by @scyyh11 in #2511
- [Bugfix] To resolve timeout error, update nightly test commands for diffusion model by @yenuo26 in #2532
- [HunyuanImage3] Align system_prompt support with official implementation by @skf-1999 in #2270
- [daVinci-MagiHuman][Doc][BugFix] Update model support for daVici-MagiHuman and fix media utils bug by @princepride in #2542
- [Bagel]Fused gate_proj and up_proj by @princepride in #2546
- [Bugfix] Accept 'speaker' as alias for 'voice' in TTS speech API by @marksverdhei in #2424
- [Bugfix] Prevent Silent Stage Dropouts: fix coordinator reconnect bug, close/update race, and heartbeat stall by @pikaxinge in #1899
- [release] Fix release script by @khluu in #2566
- [release] Fix lint issue by @khluu in #2567
- [Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image by @yuanheng-zhao in #2339
- [Docs] Add expert_parallel.md by @skf-1999 in #2471
- [Feature] Add trajectory recording to BAGEL denoising loop by @timzsu in #2483
- [Perf] Wan2.2 I2V optimization: convert datatype from FP32 to BF16 in vae by @Fishermanykx in #2391
- [Diffusion] Refactor LTX2 to use unified CFG parallel framework by @TKONIY in #2160
- [Feat] image2image for Z-Image by @RuixiangMa in #1580
- [Feature] Port Bagel RDMA flow to latest main by @ahengljh in #2000
- [Feat] Add MUSA flash attention support via mate package by @yeahdongcn in #2451
- [Fix] Align diffusion proc test mock with current output fields by @ahengljh in #2584
- [Bugfix] Fix benchmark Total input tokens for multimodal requests (#2540) by @Dnoob in #2549
- [Unit Test] Add unit tests for orchestrator by @yinpeiqi in #2096
- [TTS] Add missing _generate_pcm_chunks for OmniOpenAIServingSpeech streaming by @vveerrgg in #2569
- [Perf][Qwen3-TTS][Voxtral-TTS] Share CUDA graph memory pool across decoder capture sizes by @NickCao in #2386
- [Feature] End-to-end LoRA support for BAGEL by @timzsu in #2494
- [CI] Reorganize the L1 L2 use cases and add markers by @zhumingjue138 in #2449
- [Bugfix] Enforce --max-generated-image-size on /v1/images/generations by @NickCao in #2599
- [CI]Refactor nightly test configuration in Buildkite, Add group for Omni and Diffusion models by @yenuo26 in #2582
- [Bugfix] Guard app.state access during server shutdown by @pjh4993 in #2587
- [MagiHuman] Fix audio sample rate and fps propagation for online serving by @princepride in #2554
- [Misc] Clean up method name in BAGEL. by @timzsu in #2501
- [Feat] /v1/images/generations api supports request cancel by @Semmer2 in #2621
- [Bug] Lazy-import entrypoints to fix subprocess pynvml crash by @RGB-loop in #2187
- [Docs] Add multi-thread weight loading documentation by @SamitHuang in #2445
- [Model] Add Dynin-omni model in vllm-omni by @DOGEUNNKIM in #1759
- [Bugfix] Fix precedence between caller runtime args and default stage configs by @xiaohajiayou in #2076
- Revert "[Fix] Fix slow hasattr in CUDAGraphWrapper.getattr (#1982)" by @ZeldaHuang in #2639
- [Refactor] Use trajectory_* fields for Qwen-Image structured RL outputs by @SamitHuang in #2513
- [Bugfix] Fix Qwen-Image min-size normalization for tiny requests by @david6666666 in #2637
- [Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU by @Sy0307 in #2606
- [CI][Bugfix] Update environment variables for test configurations in Buildkite YAML files to resolve HF timeout by @yenuo26 in #2628
- [Bugfix] restore legacy stage config precedence by @xiaohajiayou in #2663
- [Feat][FishSpeech] Cache DAC-encoded ref audio for voice cloning by @linyueqian in #2609
- [CI] Update merge condition in upload_pipeline_with_skip_ci.sh to include 'merge-test' label for non-main branches by @yenuo26 in #2666
- [Feature]: support Flux.2-dev CFG-Parallel by @nuclearwu in #2010
- [Entrypoint][Refactor]Stage CLI Refactor by @wuhang2014 in #2020
- [CI] Update merge condition in upload_pipeline_with_skip_ci.sh to include 'merge-test' label for non-main branches by @yenuo26 in #2667
- [Bugfix] fix mindiesd laserattention unsupported error by @fan2956 in #2673
- [Bugfix]: modify diffusion pipeline profiler result in videos by @bjf-frz in #2647
- [Profiler] Add Nsight Systems support for serving by @ahengljh in #1098
- [Config] Remove invalid LLM-only engine_args from diffusion stage configs by @ianliuy in #2622
- [Refactor] Remove dependency on librosa by @NickCao in #2273
- [Model] VoxCPM2 native AR TTS support by @linyueqian in #2658
- [BUG FIX]: prevent EngineCore crash when Qwen TTS Base task is missing ref_text by @teith in #2203
- [Doc] Add LTX-2 online serving deployment recipes with optimization benchmarks by @SamitHuang in #1971
- [feature] : add cache-dit for stable-audio-open-1.0 by @akshatvishu in #1341
- [ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync with cuda tests by @tjtanaa in #2340
- [Perf] Use global CUDA graph pool for MiMo Audio by @NickCao in #2657
- [TTS][OmniVoice] Add voice cloning support for OmniVoice TTS by @JuanPZuluaga in #2676
- [CI] [Resource] Remove unused test cases to cutdown agent resources usage by @tjtanaa in #2688
- [Bugfix] Restore user config/runtime stage init timeout by @yuanheng-zhao in #2519
- [Bugfix] Validate speaker in chat endpoint and fix case-insensitive lookup by @reidliu41 in #2407
- [Docs] Update WeChat QR code for community support by @david6666666 in #2701
- [Log] Wire stat loggers into AsyncOmniEngine to match AsyncLLM by @gcanlin in #2551
- [Bugfix] Fix Incompatible Multihook Integration (TeaCache <-> CPU Offload) by @alex-jw-brooks in #2689
- [Refactor] Extend CFG Parallel to support 3 or 4 branch dispatch across M GPUs by @zzhuoxin1508 in #2423
- [Bugfix] Fix UT for the missing of log_stats in Engine by @gcanlin in #2706
- [ROCm] [CI] Fix environment issue by @tjtanaa in #2708
- [Feat] Override single stage CLI args when stage_configs_path is set in OmniEngineArgs by @timzsu in #2684
- [Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression. by @natureofnature in #2458
- [Perf][Fish Speech] Enable CUDA Graph capture for Fast AR code predictor by @Sy0307 in #2520
- [Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path by @Celeste-jq in #2134
- [Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API by @linyueqian in #2720
- [CI][Bugfix] Refactor the test case to add support for increasing init timeout and stage init timeout in order to resolve the CI timeout error. by @yenuo26 in #2711
- [Revert] Revert "[Log] Wire stat loggers into AsyncOmniEngine to match AsyncLL… by @amy-why-3459 in #2716
- [core]refactor communication layer: PR1(Added Refactor Infra Only) by @natureofnature in #1555
- [Feature]: support Flux.2-dev tea_cache by @nuclearwu in #1871
- [Bugfix] Release stage launch lock before handshake by @fake0fan in #2717
- [Tests][Qwen3-Omni]Modify Qwen3-Omni performance test cases by @amy-why-3459 in #2600
- [Bagel]: Support
think modein single stage deployment of Bagel by @princepride in #2650 - [Misc] Cleanup: use consistent pytest-mock in unit tests by @yuanheng-zhao in #2698
- [skip ci][doc]Update async_chunk design diagram by @amy-why-3459 in #2420
- [Bugfix] Update Flux2-dev & Dynin_omni L4 e2e test by @wtomin in #2723
- [Voxtral TTS] Correct decode steps param in Voxtral TTS by @y123456y78 in #2524
- [Perf]: Speedup VoxCPM2 TTS performance and Support PagedAttention by @Sy0307 in #2690
- [Voxtral TTS] Fix Voxtral TTS input with text and ref_audio by @y123456y78 in #2750
- [CI] Qwen image edit performance benckmark by @fhfuih in #2216
- [BugFix] Remove stage_configs_path validation by @amy-why-3459 in #2741
- [Perf] Optimize MP4 encoding latency in video generation by @SamitHuang in #2735
- [Qwen3-TTS] Remove hardcoded
distributed_executor_backendto improve single-GPU performance by @iancarrasco-b10 in #2604 - [Test] Add Stable Audio offline e2e TeaCache Test by @zhangj1an in #2377
- [Omni Connector] Omni Transfer Engine Connector: Enable 1-receiver-to-N-senders to support Bagel TP/CFG parallel by @natureofnature in #2731
- [skip ci] fix docs, gdown remove --id param by @lengrongfu in #2787
- [Tests][Qwen3-Omni]Add test cases for long videos and long audios. by @amy-why-3459 in #2598
- [skip ci]add skills by @hsliuustc0106 in #2710
- [Misc] clean Temporary CI Configs by @n1ptune in #2784
- [CI][Bugfix] Update thresholds for accuracy tests by @yenuo26 in #2725
- [CI/BugFix] Fix Flaky Test for Qwen Omni Perf by @alex-jw-brooks in #2754
- [Bugfix] Reject /v1/audio/speech for Qwen omni models by @scyyh11 in #2763
- fix: do not apply FP8 quant config to vision/audio encoders for pre-quantized checkpoints by @ianliuy in #2702
- [BugFix] Fix NoneType' object has no attribute 'detach' by @amy-why-3459 in #2797
- [Bugfix] Make mrope kwargs optional in HunyuanImage3 get_mrope_input_positions by @ianliuy in #2654
- [Bugfix] Handle numpy array outputs when generate image by @lengrongfu in #1680
- [Perf] VoxCPM2: streaming VAE + compile optimization (45% RTF reduction) by @linyueqian in #2758
- [Perf] Enhance benchmark script to support baseline thresholds and proved result handling by @yenuo26 in #2789
- [Benchmark]Omni-modality model accuracy benchmark(Daily-Omni & seed-tts-eval) by @amy-why-3459 in #2558
- [CI] qwen image edit L4 accuracy test by @fhfuih in #2761
- [Perf] Eliminate Hop 3 IPC overhead for single-stage diffusion via inline execution by @SamitHuang in #2736
- [Feature] feat: add video frame interpolation postprocess by @david6666666 in #2555
- [Fix] HunyuanImage-3.0: unify naming hunyuan_image_3 → hunyuan_image3 by @TaffyOfficial in #2712
- [PERF] Wan2.2 support adalayernorm fused op by @fan2956 in #2585
- [hotfix] API connection error in CI by @fhfuih in #2810
- [Perf] VoxCPM2: Speedup by manual CUDA Graph capture for scaffold/residual forward by @Sy0307 in #2803
- Add voxcpm model support. by @IsleOfDawnlight in #2467
- [Feat][Qwen3-Omni] Shared code predictor module for Qwen3-TTS and Qwen3-Omni by @JuanPZuluaga in #2375
- [Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage by @Fishermanykx in #2762
- [Bugfix] Fix broken fp8 quantisation on Z-Image-Turbo, Qwen-Image, FLUX.1-dev by @zhangj1an in #2795
- [feature] Hidden State Prefix Caching by @alex-jw-brooks in #2164
- [Perf] Add Performance Test for Qwen-Image Step-Level Execution by @wtomin in #2707
- [CI] Skip test_thinker_prefix_caching in tests/e2e/online_serving/test_qwen3_omni.py by @yenuo26 in #2836
- [CI][Perf] Add nightly PR labels, consolidate pipeline, and switch benchmark flag to --test-config-file by @yenuo26 in #2816
- [Doc][Misc] Update DreamID-Omni Example; Add DreamID-Omni post process function by @yuanheng-zhao in #2809
- [Feat] add GLM-Image SP support by @RuixiangMa in #1983
- [CI] add qwen image and layered accuracy test by @david6666666 in #2772
- [Feature] Bagel: Support tp+cfg parallel using mooncake transfer engine connector by @natureofnature in #2705
- [PERF] Wan2.2 support rmsnorm fused op by @fan2956 in #2583
- [Test] Add performance tests for Qwen-Image-Layered model by @kechengliu97 in #2807
- [Fix][Fish Speech] Remove redundant get_vocab() in control token encoding by @Sy0307 in #2842
- [Test] Skip tests for known issues in audio and speaker recognition by @yenuo26 in #2851
- [FIX] Preserve YAML default stop words when request sends empty list by @QiuMike in #2855
- [BugFix][VoxCPM2]: split multichar Chinese tokens to match training tokenization by @Sy0307 in #2832
- Feat/Add HunyuanImage-3.0-Instruct ar part support: by @TaffyOfficial in #2713
- [Quantization] feat: add FP8 for Omnigen2 by @zhangj1an in #2441
- [Feature] Flux2 klein inpaint by @RuixiangMa in #1180
- [Refactor] Remove sox from dependencies by @NickCao in #2745
- [Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series before encoding by @david6666666 in #2847
- [Bugfix] Preserve default diffusion sampling params in default stage by @david6666666 in #2780
- [Model] Support Flux1 Schnell by @alex-jw-brooks in #2528
- [Core] Refactor CFG companion tracker and use in Orchestrator by @yinpeiqi in #2623
- [CI][Bugfix] Fix the error in generating the performance data table and add a fallback mechanism that prevents the result file from being generated when test case execution fails. by @yenuo26 in #2839
- [BugFix] Fixing occasional engine crashes caused by abort requests by @amy-why-3459 in #2871
- [Feature] Support Prefill-Decode disaggregation via vLLM KV transfer by @spencerr221 in #2220
- [Model] Add Ming-flash-omni-2.0 Thinker Stage by @yuanheng-zhao in #1822
- [Bugfix] Fix RIFE device selection for CPU-transported videos by @david6666666 in #2876
- [Bugfix] Limit Qwen-Image-Edit-2511 input image count by @david6666666 in #2840
- [Test] Add ModelRunner V2 with Qwen3-TTS Base E2E Test to CI pipeline by @tzhouam in #2321
- [Bugfix] Fix image quality in /v1/images/generations for multi-stage pipeline by @RuixiangMa in #2267
- Fix NoneType error of outputs by @QiuMike in #2315
- [Refactor] refactor wan2.2 diffuse && add ut by @bjf-frz in #2672
- [Misc] Warn When vLLM / vLLM-Omni Have Mismatched Versions by @alex-jw-brooks in #2691
- [Bugfix] Fix cache dit for Longcat & LTX2 by @alex-jw-brooks in #2860
- [CI] Skip test_bagel[parallel_tp_2] and test_wan22_i2v_online_serving_generates_video[wan22_i2v_usp2_hsdp2] by @yenuo26 in #2883
- [Bugfix] fix CI failure by @RuixiangMa in #2884
- [Cleanup] Remove dead runtime.defaults config parameters by @NickCao in #2343
- [skip CI][Docs] Add Qwen3-Omni and Qwen3-TTS performance blog and figures by @Shirley125 in #1837
- Nextstep online e2e by @Joshna-Medisetty in #2107
- Add Teacache Support for LongCat Image by @alex-jw-brooks in #1487
- [skip ci][recipe] draft vllm-omni recipes by @hsliuustc0106 in #2646
- [Docs] Update WeChat QR code for community support by @david6666666 in #2895
- [Refactor] Remove resampy dependency by @NickCao in #2891
- [Feature]Support audio streaming input and output-phase2 by @Shirley125 in #2581
- [BugFix]: Fix multi-stage cfg bug by @princepride in #2801
- [doc][skip ci] remove redundant content in readme by @Shirley125 in #2901
- [Feat] cache-dit for GLM-Image by @RuixiangMa in #1399
- [Agent] Add NPU main2main skill by @gcanlin in #2858
- [Bugfix][VoxCPM2] Fix voice-clone decode loop by padding prefill prompt by @Sy0307 in #2894
- [Config Refactor][2/N] Pipeline + Deploy Config Schema by @lishunyang12 in #2383
- [Bugfix][VoxCPM2]: Fix vectorized_gather OOB under concurrent prefill+decode batches by @Sy0307 in #2903
- perf(helios): replace strided RoPE with stack+flatten for contiguous memory by @willamhou in #2474
- [Bugfix] diffusion end points allow model mismatch by @xiaohajiayou in #2805
- [Feat] Support layerwise CPU offloading for more videogen models by @yuanheng-zhao in #2018
- [Config Refactor 2.5/N] Centralize pipeline registry by @lishunyang12 in #2915
- [Perf] Optimize Wan2.2 device free on image preprocess by @fan2956 in #2852
- [Docs] update documents by @R2-Y in #2921
- [BugFix] Fixed the issue where --no-async-chunk was not working. by @amy-why-3459 in #2934
- [CI] Restructure vLLM-Omni Test Layout, Fixture Scope, and Support Modules by @yenuo26 in #2620
- [Model] Add HSDP support for LTX-2 by @fywc in #2899
- [Revert] drop Wan2.2 prompt-length enforcement from #2847 by @david6666666 in #2877
- [Bugfix] Fix GLM-Image output dimensions and image edit pipeline by @JaredforReal in #2320
- [Docs] Add Wan2.2 image-to-video recipe for Ascend NPU (A2/A3) by @gcanlin in #2919
- [Example] Add Hunyuan-Image3 end2end.py and README.md by @kechengliu97 in #2590
- CI: publish Omni images to a separate Docker Hub repository by @sheralskumar in #2829
- [Enhancement] add pytorch profiler ops and memory record by @bjf-frz in #2472
- [Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) by @ayushag-nv in #2749
- [CI] Remove small resolution test in Qwen-Image Perf test when vae patch parallel is enabled by @wtomin in #2872
- [Bugfix] Truncate mimo-audio code2wav prompt to MAX_CODE2WAV_TOKENS by @lishunyang12 in #2693
- [Feat][sleepmode] add omni sleepmode and ack protocol by @Flink-ddd in #2022
- [CI][Bugfix] Improve cosine similarity calculation by incorporating length harmony adjustment in text comparison by @yenuo26 in #2964
- [BugFix] Fix the issue with stream=True by @amy-why-3459 in #2955
- [Enhancement] Engine runtime errors by @pi314ever in #2426
- [BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YAML by @xiaohajiayou in #2940
- [Model] Add HSDP support for Stable-Audio-Open by @fywc in #2982
- [Enhancement]remove duplicate video preprocess in Wan2.2 pipeline by @bjf-frz in #2963
- [Bugfix] Fix VAE parallelism dist.gather performance bottleneck on NPU by @fan2956 in #2969
- [Config Refactor] Migrate 5 TTS models (VoxCPM2 / CosyVoice3 / MiMo Audio / Voxtral TTS / Fish Speech S2 Pro) to Pipeline + Deploy schema by @linyueqian in #2958
- [Bugfix] ComfyUI image-to-image DALL-E endpoint cases #2886 by @david6666666 in #2980
- Codex revert pr reviewer by @hsliuustc0106 in #2959
- [Bugfix] treewide: drop references to librosa by @NickCao in #2996
- [Feature] Load Balancer - Add LeastQueueLengthBalancer RoundRobinBalancer by @NumberWan in #2448
- Remove dead code by @dhonnappa-amd in #2998
- [Qwen3TTS][Bugfix] Guard inner CUDA graph replay during outer capture by @ChipMates in #2910
- [Bugfix][Model] Fix Qwen3-TTS Code2Wav max_model_len validation by @greenhandzpx in #2508
- [Feature][Voxtral TTS] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS by @y123456y78 in #2338
- [ci]add merge ci for streaming input by @Shirley125 in #2965
- [BugFix][CI]: Fix test_omni_sleep_mode ci bug by @princepride in #3010
- [ConfigRefactor] GLM-Image by @JaredforReal in #2977
- [CI] Update test markers and configurations to use 'full_model' for L4 nightly tests by @yenuo26 in #2641
- [NPU] Support code predictor NPU graph by @gxxx-hum in #2695
- [Tests] Modify test cases by @amy-why-3459 in #2991
- [AutoRound] Support Qwen Omni W4A16 quantization model by @lvliang-intel in #2670
- Fix links in glm_image.md by @oglok in #3028
- [feat]: General diffusers adapter backend to run diffusion models by @fhfuih in #2724
- [BugFix] Surface diffusion metrics in chat completions; sanitize Bagel img2img mm kwargs by @NumberWan in #2932
- [Refactor] Let diffusion pipelines declare offloadable modules via SupportsModuleOffload by @NickCao in #2427
- [Feature] Failure message shows more details by @wuhang2014 in #2961
- [Tests]Add accuracy benchmark L4 test cases by @amy-why-3459 in #2843
- [MUSA][Feat] Upgrade MATE to match CUDA/ROCm behavior on FA by @yeahdongcn in #2766
- [Doc] Add diffusion attention backend docs by @david6666666 in #3011
- [Bugfix] Fix default sampling params for /v1/videos on main by @david6666666 in #3049
- [Model] Ming-flash-omni-2.0 Omni-Speech and TTS by @yuanheng-zhao in #2890
- [BugFix] Bagel img2img e2e: drop extra_body height/width by @NumberWan in #3054
- [BugFix] Preserve media access args for stage configs by @xiaohajiayou in #2956
- feat: add LTX-2.3 video generation model support by @oglok in #2893
- [Test] add stability test case for wan2.2, qwen-tts, qwen3-omni and qwen-image model and modified conftest.py in test/dfx/ by @zhumingjue138 in #2817
- [PERF]use mindiesd fused rope and rope cache by @Hu1Lcode in #2571
- [BugFix]: Fix Qwen3-TTS code2wav fails when enforce_eager: false by @ChefWu551 in #2868
- [Perf][Wan2.2] Add fused RMSNorm replace WanRMS_norm on npu by @lyj-jjj in #3067
- [Docs] CLI Docs updates by @wuhang2014 in #2978
- [CI] Re-enable Prefix Cache Test by @alex-jw-brooks in #2869
- [Chore] refine the offline examples by @RuixiangMa in #3095
- [Feat] support quantization for GLM-IMAGE by @RuixiangMa in #2292
- [Feature] XiaomiMiMo/MiMo-V2.5-ASR support by @qibaoyuan in #3089
- [Feature] Coordinator PUB mechanism optimization by @NumberWan in #2442
- [Enhancement]modify profiling.md by @bjf-frz in #3051
- [CI][Perf] Add Wan22 i2v perf nightly ci by @bjf-frz in #3063
- [Bugfix] T5 text encoder to render correct text in FLUX.1-dev by @RuixiangMa in https://github.com/vllm-project/vllm-omni/pull/2760
- [Bugfix] Fix Flux2klein Text Input Processing by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/3098
- docs: update add-tts-model skill and contributing guide with single-stage patterns by @linyueqian in https://github.com/vllm-project/vllm-omni/pull/2806
- [Bugfix] fix GLM-Image multi-stage online generation fail by @RuixiangMa in https://github.com/vllm-project/vllm-omni/pull/3084
- [Bugfix] graceful shutdown for multi-stage engine processes by @RuixiangMa in https://github.com/vllm-project/vllm-omni/pull/3001
- [Config Refactor] sentinel default precedence by @xiaohajiayou in https://github.com/vllm-project/vllm-omni/pull/3078
- [Config Refactor]: Remove bagel yaml by @princepride in https://github.com/vllm-project/vllm-omni/pull/2936
- [Refactor] Remove redundant benchmarks from Qwen3-Omni by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/3108
- [Feature] Streaming video input with EVS frame filtering (RFC #2201 Phase 2-4) by @Sy0307 in https://github.com/vllm-project/vllm-omni/pull/2342
- [MUSA][Feat] torch.accelerator support by @yeahdongcn in https://github.com/vllm-project/vllm-omni/pull/3101
- [Benchmark] Universal TTS benchmark: Qwen3-TTS + VoxCPM2 with 3 task types (voice-clone/default/design) by @linyueqian in https://github.com/vllm-project/vllm-omni/pull/2835
- [XPU] Enable torch inductor for xpu by @xuechendi in https://github.com/vllm-project/vllm-omni/pull/3113
- fix generations reponse body by @FrosterHan in https://github.com/vllm-project/vllm-omni/pull/3094
- [CI] Fix Seed TTS Simple Unit Tests by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/3126
- [Refactor] Remove the default value "mp" from distributed_executor_backend. by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/3007
- [NPU] [Quant] Support HunyuanImage3 offline quantization with vLLM-Ascend on diffusion path by @jiangmengyu18 in https://github.com/vllm-project/vllm-omni/pull/2979
- [Feat] support layerwise offload for Bagel by @lsyyysky in https://github.com/vllm-project/vllm-omni/pull/2734
- [Refactor] Standardize data entry key names to {type}.{qualifier} format by @divyanshsinghvi in https://github.com/vllm-project/vllm-omni/pull/1829
- [Bugfix][Refactor] Migrate Voxtral TTS config and parser registry by @yuanheng-zhao in https://github.com/vllm-project/vllm-omni/pull/3065
- Benchmark and Bugfix for GLM-Image by @JaredforReal in https://github.com/vllm-project/vllm-omni/pull/3024
- [Refactor/Bugfix] Use Delta Messages for Streaming by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/2911
- [Cleanup] Drop empty _DIFFUSION_PIPELINES placeholder by @lishunyang12 in https://github.com/vllm-project/vllm-omni/pull/3040
- [Refactor] Centralize stage sampling params resolution by @reidliu41 in https://github.com/vllm-project/vllm-omni/pull/3153
- [Feature]: support Flux.1-dev tea_cache by @nuclearwu in https://github.com/vllm-project/vllm-omni/pull/2774
- [Bugfix] Add seed support to TTS API for deterministic Fish Speech voice generation by @ianliuy in https://github.com/vllm-project/vllm-omni/pull/2624
- feat: add MOSS-TTS-Nano single-stage TTS support by @linyueqian in https://github.com/vllm-project/vllm-omni/pull/2753
- [Refactor] Extract has_flash_attn_pkg in OmniPlatform by @yeahdongcn in https://github.com/vllm-project/vllm-omni/pull/3068
- [Test] L5 reliability test for wan2.2 and qwen3_omni model by @zhumingjue138 in https://github.com/vllm-project/vllm-omni/pull/2972
- [CI] Update nightly/ready test scheduling and unskip selected cases by @yenuo26 in https://github.com/vllm-project/vllm-omni/pull/2945
- [MUSA][Chore] Bump torchada by @yeahdongcn in https://github.com/vllm-project/vllm-omni/pull/3132
- [BugFix] qwen3_tts: build speaker_encoder in init so load_format: dummy works by @leohuang257 in https://github.com/vllm-project/vllm-omni/pull/3117
- [CI] skip MOSS-TTS-Nano E2E Test by @yenuo26 in https://github.com/vllm-project/vllm-omni/pull/3171
- [Model] Add InternVLA-A1 offline inference support by @Greyman-Seu in https://github.com/vllm-project/vllm-omni/pull/2737
- fix(diffusion): correct metric keys, remove duplication, minor cleanup by @willamhou in https://github.com/vllm-project/vllm-omni/pull/2692
- [Docs] Modify Qwen3-Omni's recipe by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/3109
- [CI]Skip failing text_to_image README examples in CI due to subprocess exit issue by @yenuo26 in https://github.com/vllm-project/vllm-omni/pull/3190
- [Docs] Add performance profiling step to add-diffusion-model skill by @SamitHuang in https://github.com/vllm-project/vllm-omni/pull/3196
- [Doc][Frontend][Model][VoxCPM2] Support instructions and per-request cfg_value by @gnomefin in https://github.com/vllm-project/vllm-omni/pull/3118
- [perf]Qwen3-Omni performance optimization by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/3164
- Revert "[perf]Qwen3-Omni performance optimization" by @hsliuustc0106 in https://github.com/vllm-project/vllm-omni/pull/3202
- Revert "[Doc][Frontend][Model][VoxCPM2] Support instructions and per-request cfg_value" by @hsliuustc0106 in https://github.com/vllm-project/vllm-omni/pull/3204
- [BugFix]: recalculate model_config to fix FA3 scheduler_metadata shape mismatches by @ZhengWG in https://github.com/vllm-project/vllm-omni/pull/3110
- Update WeChat QR code by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/3213
- [Docs] Update quantization guides by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/3200
- [Bugfix][CI] fix file name too long error in Omni · Doc Test with H100 by @yenuo26 in https://github.com/vllm-project/vllm-omni/pull/3209
- Fix "Add online serving to Stable Audio Diffusion and introduce v1/audio/generate endpoint" by @ekagra-ranjan in https://github.com/vllm-project/vllm-omni/pull/1794
- [Bugfix] Fix Qwen3-TTS Base ICL garbled output when ref_audio is a local path by @Dnoob in https://github.com/vllm-project/vllm-omni/pull/2984
- [Bugfix] Add missing .gpu accessor for inputs_embeds CpuGpuBuffer in prefill overlay by @dubin555 in https://github.com/vllm-project/vllm-omni/pull/2068
- [Rebase] Rebase to vllm 0.20.0 by @tzhouam in https://github.com/vllm-project/vllm-omni/pull/3232
- [Bugfix][Qwen3TTS] Use float32 for code predictor on fp16-only GPUs by @NickCao in https://github.com/vllm-project/vllm-omni/pull/3253
- [Bugfix][CI] Vendor Qwen3-TTS and CosyVoice3 reference audio fixtures by @linyueqian in https://github.com/vllm-project/vllm-omni/pull/3267
- [MUSA] Add get_device_capability/get_device_version and remove get_diffusion_model_impl_qualname/prepare_diffusion_op_runtime in musa/platform.py by @yeahdongcn in https://github.com/vllm-project/vllm-omni/pull/3179
- [Bugfix]pass TP size to diffusion config by @natureofnature in https://github.com/vllm-project/vllm-omni/pull/2867
- [Quantization] Enable FP8 online quantization for Z-image text encoder by @Isotr0py in https://github.com/vllm-project/vllm-omni/pull/1338
- [Bugfix][CI] Enhance get_open_port function to handle port binding more robustly by @yenuo26 in https://github.com/vllm-project/vllm-omni/pull/3216
- Revert "[Quantization] Enable FP8 online quantization for Z-image text encoder" by @hsliuustc0106 in https://github.com/vllm-project/vllm-omni/pull/3272
- [Mergify] Init config by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/812
- [BugFix] Fix nullify regressions in pure diffusion offline examples by @xiaohajiayou in https://github.com/vllm-project/vllm-omni/pull/3181
- [CI] remove skip in test_z_image_expansion and test_qwen3_omni_expansion by @yenuo26 in https://github.com/vllm-project/vllm-omni/pull/3211
- [Bugfix] Prevent silent failure of get_config when trust_remote_code passed as None by @yuanheng-zhao in https://github.com/vllm-project/vllm-omni/pull/3241
- add check_health in inline client by @lengrongfu in https://github.com/vllm-project/vllm-omni/pull/3052
- [Misc] Remove Entrypoint Hijack for vLLM / 0.20.0 Changes by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/3082
- [AMD][CI][Bugfix] Fix "simple unit test" by @tjtanaa in https://github.com/vllm-project/vllm-omni/pull/3225
- [BugFix][Bagel]: Fix vLLM-Omni as rollout bug: number of trajectory_latents count less by @princepride in https://github.com/vllm-project/vllm-omni/pull/3258
- [Config Refactor] Derive deploy override fields from stage config by @xiaohajiayou in https://github.com/vllm-project/vllm-omni/pull/3162
- [CI]Change pixels tolerance from 5 to 10 by @princepride in https://github.com/vllm-project/vllm-omni/pull/3289
- [CI failed]Revert "[Config Refactor] Derive deploy override fields from stage config" by @Gaohan123 in https://github.com/vllm-project/vllm-omni/pull/3287
New Contributors
- @indevn made their first contribution in #1703
- @timzsu made their first contribution in #2493
- @loveysuby made their first contribution in #1439
- @willamhou made their first contribution in #2470
- @scyyh11 made their first contribution in #2511
- @skf-1999 made their first contribution in #2270
- @pikaxinge made their first contribution in #1899
- @vveerrgg made their first contribution in #2569
- @pjh4993 made their first contribution in #2587
- @RGB-loop made their first contribution in #2187
- @DOGEUNNKIM made their first contribution in #1759
- @xiaohajiayou made their first contribution in #2076
- @ianliuy made their first contribution in #2622
- @teith made their first contribution in #2203
- @Celeste-jq made their first contribution in #2134
- @iancarrasco-b10 made their first contribution in #2604
- @zhangj1an made their first contribution in #2377
- @n1ptune made their first contribution in #2784
- @TaffyOfficial made their first contribution in #2712
- @IsleOfDawnlight made their first contribution in #2467
- @QiuMike made their first contribution in #2855
- @fywc made their first contribution in #2899
- @sheralskumar made their first contribution in #2829
- @ayushag-nv made their first contribution in #2749
- @dhonnappa-amd made their first contribution in #2998
- @ChipMates made their first contribution in #2910
- @greenhandzpx made their first contribution in #2508
- @gxxx-hum made their first contribution in #2695
- @lvliang-intel made their first contribution in #2670
- @ChefWu551 made their first contribution in #2868
- @FrosterHan made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3094
- @lsyyysky made their first contribution in https://github.com/vllm-project/vllm-omni/pull/2734
- @leohuang257 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3117
- @Greyman-Seu made their first contribution in https://github.com/vllm-project/vllm-omni/pull/2737
- @gnomefin made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3118
- @ZhengWG made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3110
Full Changelog: v0.19.0rc1...v0.20.0rc1