vllm-project/vllm-omni v0.22.0 on GitHub

Highlights

This release features 339 commits from 124 contributors, including 52 new contributors.

vLLM-Omni v0.22.0 is a major world-model, diffusion, and omni-serving release aligned with the vLLM 0.22 release line. It adds Cosmos 3 and DreamZero world-model support, broadens speech and multimodal model coverage, and improves production serving across multistage runtime, OpenPI robot serving, diffusion acceleration, quantization, and hardware backends.

Key Improvements

World model support, with Cosmos3 model support, sound generation, action modality, and DreamZero integration with CFG parallel plus OpenPI serving. (#3454, #4073, #4102, #2162, #3673)
Aligned with the vLLM 0.22 release line, including the vLLM 0.21 and 0.22 rebases, dependency compatibility updates, release image builds, and PyPI upload support. (#3530, #3891, #4022, #3428, #3667)
Expanded speech and omni model coverage, adding MiniCPM-o 4.5, Lance, MOSS-TTS, GLM-TTS, Higgs Audio v2, Covo-Audio-Chat, and new recipes/deploy profiles for several speech and omni models. (#3642, #4067, #3710, #3420, #3141, #3762, #2293)
Improved diffusion acceleration and parallel execution, including Wan2.2 pipeline parallelism, HunyuanImage3 VAE parallelism, HunyuanVideo 1.5 USP/VAE patch parallel, LTX-2.3 CFG parallel, step-wise LoRA, MagCache, CacheDiT coverage, and prompt-embedding caching. (#2322, #3091, #3979, #3905, #3639, #1287, #3265, #3906, #2962)
Made audio and TTS serving more production-ready, with Qwen3-TTS, Qwen3-Omni, VoxCPM2, Fish Speech S2 Pro, OmniVoice, async audio input, custom voices, ref-context cache, and high-concurrency improvements. (#3662, #3492, #3322, #3592, #4054, #3882, #3773, #3336, #3614)
Expanded quantization and hardware coverage, including Blackwell diffusion attention backends, W4A16, FP8/INT8, MXFP4, MXFP8, ModelOpt mixed FP8/NVFP4, batched ModelOpt FP8, ROCm AITER, Intel XPU, and Ascend NPU updates. (#3353, #3059, #3700, #3902, #3578, #3570, #3782, #3943, #4155, #3079, #3015, #3419, #3511, #2325)

Core Architecture & Runtime

Integrated OmniCoordinator into the stage engine pipeline and continued the communication-layer refactor across non-async omni paths, improving multistage orchestration, request routing, and model-runner reuse. (#3569, #2677, #3719, #3476)
Hardened stage and diffusion lifecycle behavior with worker dead detection, cleanup fixes, safer subprocess shutdown, SIGINT cleanup for NCCL/ZMQ resources, master-port selection fixes, and diffusion prefetch protection for newer transformers shard-resolution behavior. (#3214, #3494, #3751, #3872, #3803, #4076)
Improved request and scheduler correctness through unified diffusion request identity, prefix-cache and token-history fixes, streaming finish reasons, Qwen3-Omni sampling alignment with transformers, and deterministic media-path handling in mixed-modality examples. (#3744, #3665, #3681, #3374, #4137, #3355)
Added TrackingArgumentParser and refreshed configuration behavior around recursive engine-arg merging, deploy-config field allowlisting, concrete entrypoint typing, and single-stage/multistage test coverage. (#3369, #3009, #3483, #3139)

Model Support

Added Cosmos3 support across model execution, recipes, tests, and accuracy coverage, including base model support, sound generation, and action modality support. (#3454, #4073, #4102)
Added DreamZero world-model integration with CFG parallel, OpenPI serving, deployment configs, online examples, OpenPI client helpers, and source-parity tests. (#2162, #3673)
Added or expanded omni and multimodal model support for MiniCPM-o 4.5, Lance, MOSS-TTS, GLM-TTS, Higgs Audio v2, Covo-Audio-Chat, HiDream-I1-Full, Ming-flash-omni-2.0 image generation, SenseNova U1, and Qwen3-Omni Thinker LoRA for RL training. (#3642, #4067, #3710, #3420, #3141, #3762, #2293, #2572, #2875, #3319, #3915)
Improved model-family behavior across Qwen-Image, Qwen-Image-Edit, BAGEL, HunyuanImage3, HunyuanVideo 1.5, FLUX.2-dev, LTX-2/LTX-2.3, DreamID-Omni, Helios, Ovis image, MiMo-Audio, and Ming-flash-omni. (#3608, #3219, #3933, #3728, #3857, #3979, #3244, #3621, #3905, #3265, #3470, #3876, #3686, #4080)

Audio, Speech & Omni Production Optimization

Optimized Qwen3-TTS for high-concurrency serving with precomputed custom voices, ref-context cache, restored cross-request Code2Wav batching, persistent prompt-embedding helpers, reduced CUDA Graph buckets, and compatibility fixes for newer transformers versions. (#3662, #3492, #3322, #3992, #3932, #3880)
Improved Qwen3-Omni performance and correctness with TTFP optimization, sampling alignment with transformers, prefix-cache correctness, long-output correctness tests, torch.compile accuracy fixes, and streaming-input fixes after the v0.22 rebase. (#4054, #4137, #3665, #3539, #3885, #4085)
Improved Fish Speech S2 Pro, VoxCPM2, OmniVoice, GLM-TTS, Higgs Audio v2, and MOSS-TTS serving paths through high-concurrency decode work, Triton/CUDA Graph acceleration, voice clone serving, reproducible seeds, nonverbal tags, and broader offline/online examples. (#3773, #3882, #3336, #3668, #3968, #3141, #3762, #3420)
Added audio SLO metrics, cross-stage transfer metric families, audio streaming continuity metrics, and per-stage/per-replica metric wrapping for upstream vllm:* metrics. (#3576, #3618)

Diffusion, Image & Video Generation

Added and expanded diffusion parallel execution with Wan2.2 pipeline parallelism, HunyuanImage3 VAE parallelism, HunyuanVideo 1.5 USP plus VAE patch parallel, LTX-2.3 CFG parallel, BAGEL VAE parallel, and HunyuanVideo/HunyuanImage3 NPU performance work. (#2322, #3091, #3979, #3905, #3982, #3178)
Expanded diffusion acceleration with CacheDiT for Helios, DreamID-Omni, SenseNova U1, and LTX-2; prompt-embedding caching; MagCache; step-wise LoRA; and CacheDiT-related correctness fixes. (#3470, #3265, #3906, #3621, #2962, #1287, #3639, #3219)
Improved image and video generation correctness and serving behavior across HunyuanImage3, Qwen-Image, Qwen-Image-Edit, Flux2 Klein, GLM-Image, SD3, SenseNova U1, and /v1/videos, including long-prompt/device fixes and safer bf16 video frame conversion before NumPy output. (#4145, #3933, #4074, #3711, #3717, #3451, #3949, #4114)
Improved diffusion serving and benchmark behavior with endpoint routing for image edits, benchmark endpoint naming, output comparison tooling, performance quality gates, and stage-level benchmark statistics. (#3693, #3137, #3175, #3851, #3628)

Quantization & Memory Efficiency

Added broader diffusion quantization support, including Wan2.2 W4A16, GLM-Image W4A16, LTX-2 online FP8/INT8, DreamID-Omni online FP8/INT8, NPU MXFP4 online/offline quantization, XPU MXFP8, ModelOpt mixed FP8/NVFP4 and batched ModelOpt FP8 serving support. (#3353, #3059, #3700, #3902, #3578, #3782, #3570, #3943, #4155)
Added quantization quality and trajectory comparison tooling for diffusion outputs, improved quantization benchmark handling for omni outputs, and expanded quality-gate coverage for FP8 Z-Image and related diffusion tests. (#3175, #3653, #3929)
Improved memory and cache behavior through Qwen-Image text encoder cleanup, prompt-embedding cache support, custom pipeline sleep memory release fixes, global CUDA graph pool reuse, BAGEL per-step sync removal, and AR prefix hidden-state CPU staging deduplication. (#3608, #2962, #3818, #3361, #3987, #3734)

RL, Serving & Integrations

Added DreamZero/OpenPI serving and a realtime OpenPI robot serving API, including online DreamZero examples, OpenPI client helpers, connection tests, and serving tests. (#2162, #3673)
Added Qwen3-Omni Thinker LoRA support for RL training and improved custom pipeline argument handling, sleep/wakeup memory behavior, and multistage deployment coverage. (#3915, #2973, #3818, #3610)
Improved OpenAI-compatible serving behavior for image edits, speech generation, realtime audio, chat/multistage generation, invalid parameter handling, stream finish reasons, and frontend audio engine errors. (#3693, #2849, #3614, #3374, #3652, #3316)
Added Yuanrong TransferEngine connector support for NPU and improved connector/runtime infrastructure around chunk transfer, memory pools, local-rank handling, distributed KV flow, and multi-replica GPU device mapping. (#3180, #3569, #3740, #4132)

Platforms, Distributed Execution & Hardware Coverage

Expanded Blackwell diffusion support with CUDNN attention, FlashInfer attention auto-routing, and SageAttention3 backend support for GB200/B200/RTX 5090/PRO 6000/DGX Spark class systems. (#3079, #3015)
Improved ROCm coverage with AITER GroupNorm, AITER backend support for ring attention, and v0.22-era ROCm CI fixes. (#3419, #3511, #3946)
Improved Intel XPU coverage with CosyVoice3 support, MXFP8 support through the vLLM main-repo method, diffusion attention defaults, Docker/CI updates, v0.22 rebase fixes, and Wan2.2 S2V RoPE/cache_dit optimization. (#2325, #3782, #3525, #3675, #4059, #4062)
Improved Ascend NPU coverage with Wan2.2 MXFP4 quantization, HunyuanImage3 FA-FP8, GLM-Image stage configs and HCCL runtime environment fixes, Yuanrong connector support, sampler/runtime fixes, and v0.22 ModelRunner updates. (#3578, #3540, #3235, #3180, #3517, #4130)

CI, Benchmarks & Documentation

Unified the release pipeline around a NIGHTLY=1 option, added x86_64/aarch64 image builds, enabled twine upload to PyPI, refreshed Docker bases, and updated CUDA/ROCm/XPU installation docs for the current release line. (#3428, #3667, #3859, #4059)
Added or improved reliability, invalid-parameter, nightly parity, accuracy, and performance coverage for Cosmos3, DreamZero, HunyuanImage3, HunyuanVideo 1.5, GLM-Image, BAGEL, VoxCPM2, Qwen3-Omni, Wan2.2, MOSS-TTS, and multistage deployment. (#3454, #2162, #3790, #3852, #3451, #2175, #4055, #3729, #4097, #3610)
Improved benchmarking and observability infrastructure with audio SLOs, cross-stage transfer metrics, modality metrics, Prometheus/stat-logger tests, audio-streaming continuity metrics, diffusion benchmark endpoint routing, optional baseline assertions, and repo-wide benchmark documentation. (#3576, #3618, #3693, #3695, #1939)
Refreshed docs and recipes for quantization, diffusion performance, CosyVoice3 online serving, GLM-Image, Helios, Qwen Image Edit, VACE, MiniCPM-o 4.5, Lance, MOSS-TTS, VoxCPM2, Cosmos3, and CUDA image commands. (#3764, #3851, #3748, #2950, #3114, #3684, #3584, #4067, #3710, #3420, #3850, #3454, #3836)

What's Changed

[chore] Update command to download dataset from huggingface-cli to hf by @Gaohan123 in #3403
[Refactor] Replace and ban a few torch.cuda functions in favor of torch.accelerator replacements. by @NickCao in #3365
[Clean] Remove multi-replica Bagel CI and related docs/configs by @fake0fan in #3407
Update CODEOWNERS feature reviewers by @david6666666 in #3378
[Test] Unify L2/L3 test layout, Buildkite steps, and test helpers by @yenuo26 in #2556
[Hardware] Extend diffusion engine plugin extensibility for out-of-tree hardware backends by @yuchenjiangyj in #3239
[Feat] support hsdp for Bagel by @lsyyysky in #3150
[Bugfix] Fix the issue where the seed parameter does not take effect when using the OpenAI Python client by @Phi-C in #3436
[Bugfix] Fix Dtype Crashes in SD3 by @alex-jw-brooks in #2526
[Feature][Hunyuan image 3.0] AR + DIT with kv reuse. by @Bounty-hunter in #3346
[Test][HunyuanImage3] Add e2e offline I2T smoke test by @TaffyOfficial in #3332
[BugFix]Fix default stage config path in voxcpm2 by @sphinxkkkbc in #3447
[Feat] Add Sequence Parallelism (USP) support for HunyuanVideo 1.5 transformer by @daixinning in #2444
[Feature] online HunyuanImage-3.0 IT2I (image editing) support by @skf-1999 in #3410
enhancement: extend to dmd2 to image generation + add flux, qwen image pipelines by @ayushag-nv in #2974
[Refactor] Rename SupportsModuleOffload to SupportsComponentDiscovery by @NickCao in #3354
Add Qwen3 TTS Model recipe by @chzhang2021 in #3130
[Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs by @linyueqian in #3405
[Bugfix] Qwen-Image use teachche serve will crash by @lengrongfu in #3450
[Perf] Optimize VoxCPM2 first-request latency via startup warmup by @Dan250124 in #3424
[Bugfix] fix OmniGen2 offload and dtype mismatch by @RuixiangMa in #2560
[Feature] Add FP8 quantization for Voxtral TTS by @akshatvishu in #3036
Fix NPU code predictor device mismatch in concurrent mode by @Wallbreazzz in #3453
[Test] Restore tts mark and omni_runner_function fixture for Voxtral TTS by @linyueqian in #3462
[CI] Update merge condition to skip L3 merges during weekly test and update doc by @zhumingjue138 in #3197
[CI] Refine nightly pytest command in Omni · Function Test with H100 to avoid duplicate testing. by @yenuo26 in #3459
（Phase 1）Add ModelOpt FP8 auto-detect support for diffusion checkpoints #2709 by @baonudesifeizhai in #2913
[CI][Nightly] Shard nightly Diffusion X2I H100 lanes and centralize shard definitions by @wuhang2014 in #3455
[CI] Remove VLLM_TEST_CLEAN_GPU_MEMORY to avoid environment variable pollution that causes unnecessary GPU detection, thereby slowing down test case execution. by @yenuo26 in #3446
[Diffusion][Attention] Support per-role attention backend via CLI by @gcanlin in #2681
[Feature] hunyuanimage support flash attn by @Bounty-hunter in #2981
[Perf] Fix Qwen3-TTS latency regression by @Sy0307 in #3485
[ROCm] [CI] Add the same skip ci logic as CUDA CI by @tjtanaa in #3482
[Docs] Refactor the attention backend docs/skill by @gcanlin in #3475
[Performance] Improve MiMo-Audio tokenizer decoding performance by @qibaoyuan in #2183
[BugFix] Rename attention_config to diffusion_attention_config by @gcanlin in #3489
[Bug][Hunyuanimage 3.0] fix different AR encode behavior between online and offline by @Bounty-hunter in #3500
[Misc] Clean logs for image gen task by @wuhang2014 in #3414
[CI] skip failing diffusion and accuracy cases (#3432, #3256, #3257, #3488) by @yenuo26 in #3507
[New Model]: Add sensenova u1 support by @princepride in #3319
[Config] Add HunyuanImage3 deploy configs by @Fishermanykx in #3172
[Fix] Fix RMSNorm inductor KeyError under HSDP + torch.compile by @LJH-LBJ in #3460
[Perf] Remove dead audio_tower and visual from Qwen3-Omni talker stage by @NickCao in #3296
[bugfix][ci] avoid Whisper transcript deduplication in realtime audio test by @Shirley125 in #3417
[Chore] explicit .float() conversion in Helios's optimized_scale function by @RuixiangMa in #3529
[CI][Bugfix] Improve e2e latency logging, update response classes to include detailed latency documentation and add startup time logging by @yenuo26 in #3246
[Recipes]update Wan2.2-I2V gpu part by @bjf-frz in #3271
[BugFix] Modify the splicing method of streaming audio output. by @amy-why-3459 in #3438
[Bugfix] Align the AR and DiT prompt formatting across both online and offline modes. by @Bounty-hunter in #3516
[FIX] Ensure extra_params are correctly merged into sampling params in _create_diffusion_speech() by @saadaltohamy in #3320
[Nightly CI] Remove TP case by @NumberWan in #3534
[Refactor] msgspec standardisation for data entry key names and improved type checks by @divyanshsinghvi in #3149
[New Model] Add support for tencent/Covo-Audio-Chat by @Dnoob in #2293
[bugfix, rl] Fix race condition bug on async running for diffusion model by @knlnguyen1802 in #3379
[CI] update daily omni min accuracy by @R2-Y in #3536
[Perf] Remove dead audio_tower and visual from Qwen2.5-Omni talker stage by @NickCao in #3425
[Bugfix] Fix the issue where the qwen3-omni model long-term stability test sometimes gets stuck without sending requests. by @zhumingjue138 in #3468
[Bugfix] Fix omni processing test for non-multimodal talker stage by @NickCao in #3559
Bump diffusers minimum version to >=0.38.0 by @oglok in #3349
support online FP8 quantization for FA on NPU #2236 by @lyj-jjj in #2640
[CI][Test] Add NPU nightly tests by @gcanlin in #3480
[CI][Bugfix] skip fp8 Z-Image quality gate (#3531) and add torchdiffeq dev extra by @yenuo26 in #3563
[Bugfix, rl] Diffusion worker SIGKILL under Ray actor (exitcode -9) by @knlnguyen1802 in #3533
Fix: NPU AR model runner prefix cache key flattening by @weizhoublue in #3568
[NPU][Quant] Add W8A8 MXFP8 online/offline quantization support for Wan2.2 T2V / I2V / TI2V inference on Ascend NPU by @hxhhhlalala in #3140
[skip ci][Tests] Splitting Qwen3-omni's performance test cases by @amy-why-3459 in #3501
[ROCm] Bugfix wan22 by @tjtanaa in #3463
[Bugfix] Add bot_task option of think_recaption for hunyuanimage3 it2i by @zengchuang-hw in #3551
[Feat][Config] Support additional_config for diffusion worker by @Fishermanykx in #3020
[Bugfix][HunyuanImage3.0] Fix KV reuse compatibility in SP scenarios by @Bounty-hunter in #3546
[Model] Add TP-aware MistralEncoder for FLUX.2-dev TP by @vraiti in #2465
[BugFix] Refresh TeaCache when num_inference_steps=None by @alex-jw-brooks in #2240
[Test] Add stability tests for HunyuanImage-3-Instruct by @zhumingjue138 in #3504
[Bugfix]: Fix online serving failure when using deploy config by @Fishermanykx in #3537
[Entrypoint][Refactor] Make field type hint more concrete by @wuhang2014 in #3139
[CI] Harden Qwen3-TTS perf nightly: enable Base voice_clone, add c=64/128, 2-GPU split by @linyueqian in #3491
[Feature] HunyuanImage-3.0 IT2I: multi-image input + prompt API cleanup by @TaffyOfficial in #3444
update v0.20.0 readme by @hsliuustc0106 in #3594
[Bugfix]Allow HunyuanImage3 AR sampler batching by @bjf-frz in #3590
[BugFix] fix shm connector by @Bounty-hunter in #3583
[CI] Add Qwen3-TTS tests for ready tag by @gcanlin in #3600
Update WeChat group QR code by @david6666666 in #3624
[BugFix] fix(omni): isolate diffusion KV-cache dtype from vLLM --kv-cache-dtype #3585 by @lyj-jjj in #3596
Update streaming_speech_client.py to solve Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice voice problem by @keeper-jie in #3380
[CI] add cuda marker to Diffusion X2V function pytest by @yenuo26 in #3625
[Bugfix] UnspecifiedOmniPlatform.get_device_count returns 0 by @princepride in #3636
[2/5] [core]refactor communication layer: PR 2 of 5 Qwen3 Omni non async by @natureofnature in #2677
[Bugfix]Fix multimodal cache routing for AR replicas by @bjf-frz in #3605
[BugFix] Fix the issue of thinker requests being preempted, causing shape mismatch. by @amy-why-3459 in #3147
[Bugfix] fix compatibility of _hunyuan_image3_unpack_packed_topk between vllm / vllm ascend by @Fishermanykx in #3640
[bugfix] Fix diffusers backend input bug after #2913 by @fhfuih in #3644
[BugFix] fix ci by @amy-why-3459 in #3650
[CI] Replace c=128 perf cell with c=16; loosen new-cell baselines by @linyueqian in #3637
[Rebase] Rebase to vllm v0.21.0 by @tzhouam in #3530
[BugFix] Finish async_chunk requests without pad-token injection by @NickCao in #3613
[Hunyuanimage 3.0] hunyuan accuracy test by @Bounty-hunter in #3655
[CI][Accuracy] Add Qwen-Image-2512 Qwen-Image-Edit-2511 pixel accuracy tests by @david6666666 in #3502
[Bugfix] Support diffusion worker dead detect when use inline engine by @wuhang2014 in #3214
[Bugfix]update process name for dit stage by @zengchuang-hw in #3602
[Feat] Add helios support cache dit by @lengrongfu in #3470
[ROCm] [CI] [Bugfix] Upgrade vllm version to v0.21.0 and ROCm 7.2.2 by @tjtanaa in #3659
[Refactor] Migrate and clean up TTS configs: CosyVoice3, OmniVoice, VoxCPM by @yuanheng-zhao in #3338
[Config Refactor] Support Recursive Merging for Engine Args by @alex-jw-brooks in #3009
[CI/Build] Unify release pipeline with NIGHTLY=1 option, add x86_64/aarch64 image builds by @khluu in #3428
[CI/Build] Enable twine upload to PyPI by @khluu in #3667
[Bugfix] Adapt LTX-2 connector arg with diffusers 0.38.0 by @yuanheng-zhao in #3661
[Frontend]Handle audio generate engine errors consistently by @reidliu41 in #3316
[BugFix][HunyuanImage3] Set MRoPE dynamic_arg_dims so graph mode can compile by @TaffyOfficial in #3630
Fix output finish reason issue for audio chunk in stream mode by @QiuMike in #2849
Fix reasoning_parser crash: reconstruct StructuredOutputsConfig from dict by @QiuMike in #2845
[Doc] Simplify template example subtitle by @hsliuustc0106 in #3669
[Doc] Reorganize available recipes into a table by @hsliuustc0106 in #3671
[SKILL]Add diffusion perf skill by @bjf-frz in #3461
[TTS][Perf] Optimize Qwen3-TTS high-concurrency serving by @Sy0307 in #3662
Fix diffusion engine cleanup lifecycle by @wuhang2014 in #3494
[XPU] update dockerfile and CI to 0.21.0 by @xuechendi in #3675
[Bugfix][TTS] Drop meaningless TTFT from speech-endpoint benchmarks by @linyueqian in #3674
[Bugfix] fix diffusion quantization benchmarking for Omni outputs by @RuixiangMa in #3653
[Bugfix] Fix SenseNova U1 broken import after SupportsModuleOffload by @nussejzz in #3691
[BugFix][CI]Fixing occasional CI failures by @amy-why-3459 in #3623
[HY-Imgae3.0] support hunyuan image3 dit fa-fp8 on npu by @lyj-jjj in #3540
[Bugfix][Qwen3-Omni] Handle short Code2Wav chunk outputs by @Sy0307 in #3687
[XPU] set flash_attn as default diffusion attn backend and fix k_len for cross_attn by @xuechendi in #3525
[Feature] Add support for Pipeline Parallel and integrate it into Wan 2.2 by @hadipash in #2322
Disable sampler kernel for XPU test by @pi314ever in #3718
[Bugfix] Fix hunyuanimage3 dit quant storageshape mismatch error by @fan2956 in #3694
[Refactor]Rename diffusion benchmark backend to endpoint by @bjf-frz in #3137
[Bugfix] Reject empty prompts in Flux2 Klein diffusion pipeline by @MmMaiIIi in #3711
Reject non-positive Flux2 Klein inference steps by @MmMaiIIi in #3717
[large-scale-serving] Integrate OmniCoordinator into stage engine pipeline by @chickeyton in #3569
[CI] invalid_param reliability suite and weekly http_invalid jobs by @yenuo26 in #3652
[CI] improve Buildkite testcase statistics reports by @yenuo26 in #3543
[Qwen-Image] Drop unused vision tower from text encoder by @lulugoodcoder in #3608
[Cleanup] Remove unused build_base_engine_args after #1115 by @bitborne in #3720
[Recipe] Qwen/Qwen-Image-Edit by @yixiaoer in #3684
[BugFix] fix mult cli timeout with get kv by @Bounty-hunter in #3741
[Quantization][tools] Add diffusion quantization output comparison tool by @david6666666 in #3175
[CI] optional --assert-baseline and update perf JSON baselines by @yenuo26 in #3695
[Feat] Enable VAE parallel in HunyuanImage3 by @Fishermanykx in #3091
[Bugfix][TTS] Only populate voice_name for uploaded voices without inline ref_audio by @NickCao in #3523
[XPU][CI] fix test_qwen2_5_omni_expansion.py::test_mix_to_audio by @xuechendi in #3761
[Perf][VoxCPM2][Ming-Flash-Omni] Use global CUDA graph pool by @NickCao in #3361
[Bench] Add audio-streaming continuity metric for TTS by @linyueqian in #3618
[Bugfix] Treat kv_cache_dtype=auto as unset for ring attention by @RuixiangMa in #3622
[NPU][Quant] Add W4A4 MXFP4 online & MXFP4 dual-scale online/offline quantization support for Wan2.2 T2V / I2V inference on Ascend NPU by @hxhhhlalala in #3578
Yuanrong TransferEngine Connector for NPU by @yangsonglin13 in #3180
[Doc][TTS] CosyVoice3 online docs + residual TTS yaml cleanup + remove VoxCPM v1 by @linyueqian in #3748
[Test] add run_nightly_jobs.sh for local nightly pytest parity by @yenuo26 in #3670
[Bugfix]Fix distributed stage0 multimodal cache routing by @bjf-frz in #3740
[Perf] Optimize sampler D2H sync for HY-Image by @gcanlin in #3617
[Docs] Complete quantization nav and online guide by @david6666666 in #3764
[Diffusion] Support LoRA in step-wise execution by @SamitHuang in #3639
[Bugfix] Fix qwen2_5_omni weight loading by @ksiyuan in #3598
[Benchmark] Route i2i/ti2i to POST /v1/images/edits in diffusion_benchmark_serving by @NumberWan in #3693
[AutoRound] Support WAN2.2 W4A16 quantization model by @lvliang-intel in #3353
[Feat] Support online quantization (fp8/int8) for LTX-2 by @yuanheng-zhao in #3700
Add new committers to governance page by @hsliuustc0106 in #3749
[Bugfix] Fix MiMo-Audio voice instability: stochastic local_sampler + codec streaming context by @Galleons2029 in #3686
Update WeChat group QR code by @david6666666 in #3806
[Bugfix] Fix Hunyuan worker device context by @fake0fan in #3768
（Phase 2）Add ModelOpt mixed FP8/NVFP4 support for image generation by @baonudesifeizhai in #3570
Fix OmniDiffusionConfig master_port selection for parallel launches by @SamitHuang in #3803
[Bugfix] Remove stale OmniStage import and type annotation by @qidaye in #3541
[BugFix] Fix prefer_model_sampler token history in async scheduling by @zengchuang-hw in #3681
[feature]: support Hidream-I1-Full model by @ANHDY in #2572
[Bugfix] Align Offline and Online Inference by @skf-1999 in #3506
[CI] Fix email bug & skip email distribution. by @congw729 in #3814
[Bugfix] Revert MiMo-Audio local_sampler to greedy to fix text truncation under concurrent batching (followup to #3686) by @Galleons2029 in #3817
[Bugfix] Set separate CFG flag in Helios for CacheDiT by @alex-jw-brooks in #3756
[Recipe] Add Fish Speech S2 Pro 2-GPU deploy profile by @linyueqian in #3323
[Perf] [OmniVoice] Triton kernel fusion + CUDA Graph acceleration by @univa-HARRY in #3336
[Bugfix][CI] Run Whisper validation on CPU for single-GPU runners by @linyueqian in #3822
[Feat] support cache-dit for DreamID-Omni by @fywc in #3265
[BugFix] code2wav supports disabling CUDA graph. by @amy-why-3459 in #3732
[Model] Add GLM-TTS text-to-speech model support by @BeatSeat in #3141
[Bugfix] Fix LTX2 CacheDiT Integration by @alex-jw-brooks in #3621
docs: fix CUDA pre-built image command by @akshatvishu in #3836
[BugFix][NPU] Honor prefer_model_sampler in NPU AR runner by @gcanlin in #3517
[Bugfix][Example][OmniVoice] Drop hardcoded "voice": "default" from speech_client.py by @nagisa-kunhah in #3829
Add hunyuan online accuracy test by @BLANKETusers in #3795
[CI] Increase timeout for Quantization Test in nightly build to 60 minutes by @zhumingjue138 in #3845
[Bugfix] Fix Qwen3-TTS Stage 0 prefix-caching correctness by @linyueqian in #3665
[Bugfix] fix when diffusion model not set sleeping_stages by @lengrongfu in #3023
[Higgs-Audio] bosonai/higgs-audio-v2-generation-3B-base TTS model support by @yuekaizhang in #3762
[UX] Rename default config to hunyuan_image_3_moe by @gcanlin in #3835
[Test] Qwen-Image Perf Test with High Concurrency by @wtomin in #2822
[BugFix]: CUDA device-side assert failures on single-stage BAGEL i2i requests by @NumberWan in #3680
[CI] Add nightly-ci for multi-stage deployment by @ZhengWG in #3610
[CI][Bugfix]Fix Wan2.2 I2V reference image upload by @bjf-frz in #3869
[HunyuanImage][End2End Performance CI] Add hunyuan end2end test by @Bounty-hunter in #3849
[BugFix] Fix LTX-2.3 audio latent padding for sequence parallelism by @mglyn in #3854
Update CUDA Docker base image to vLLM v0.21.0 by @hsliuustc0106 in #3859
[Docs] Strengthen diffusion perf optimization quality gate by @david6666666 in #3851
[bugfix] fix default deploy config in hunyuan_image offline example by @zengchuang-hw in #3879
[BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.9.0 by @Dan250124 in #3880
glm-image: fix(npu)per-stage runtime env for HCCL ports + GLM-Image NPU stage config by @lyj-jjj in #3235
[Feat][HunyuanImage3] Stream AR text for IT2I image edits by @TaffyOfficial in #3723
[Doc][Benchmark] Rewrite benchmarks/README.md as repo-wide index by @Dnoob in #1939
[Bugfix] Fix Qwen-Image-Edit-2511 TeaCache zero_cond_t handling by @JasonJ2021 in #3219
[Perf] Trim HunyuanVideo encoder padding tokens by @david6666666 in #3844
[Feat] opt qwen image model load use ColumnParallelLinear replace ReplicatedLinear by @lengrongfu in #3875
[Bugfix]Fix Hunyuan Image3 denoise flow alignment by @bjf-frz in #3857
[ROCm] Add support for AITER GroupNorm by @avjves in #3419
[Feature] support SP for FLUX.2-dev by @nuclearwu in #3244
[Model] Add Ming-flash-omni-2.0 Image Generation (Diffusion) Stage by @ZhengWG in #2875
[BugFix] Fix diffusion parallel_config YAML override and add deploy config field allowlist by @xiaohajiayou in #3483
[TTS][Perf] Optimize Fish Speech S2 Pro high-concurrency serving by @Sy0307 in #3773
Fix Ovis image text encoder dtype by @akshatvishu in #3876
[Bugfix] Ensure stage and diffusion subprocesses exit when parent dies unexpectedly by @RuixiangMa in #3751
[Test] Add long text output correctness test for Qwen3-Omni by @ZeldaHuang in #3539
fix image edit docs about use error image url by @lengrongfu in #3873
[Perf] Bagel Performance Nightly CI test by @NumberWan in #2175
[Feat] Support online quantization (fp8/int8) for DreamID-Omni by @yuanheng-zhao in #3902
[MXFP8][XPU] enable mxfp8 using vLLM main repo method by @xuechendi in #3782
[Blackwell] Add CUDNN_ATTN and FLASHINFER_ATTN backends for diffusion (auto-route) by @lishunyang12 in #3079
[CI] Add HunyuanVideo 1.5 X2V accuracy tests by @david6666666 in #3852
[Feature] Add cfg-parallel for LTX-2.3 by @mglyn in #3905
[Refactor] Unify Snake/SnakeBeta and alias-free activation into common modules by @BeatSeat in #3886
[Perf][Bugfix] cache hot buffers in qwen3_tts talker; fall back on evicted state by @JuanPZuluaga in #3688
[3/5][core]refactor communication layer: PR 3 of 5, all other models in non async mode by @natureofnature in #3719
[Doc] Refine vace offline inference example README by @blondeCS in #3584
[Diffusion] Unify diffusion request identity on request_id by @yJader in #3744
[Bugfix] Remove duplicate ffmpeg options in random video generation by @JLiu4Coding in #3923
[AutoRound] Support GLM-Image W4A16 quantization model by @lvliang-intel in #3059
[Doc] Reduce browser memory usage for docs by @david6666666 in #3870
[Refactor][Qwen3-TTS] Construct speech tokenizer encoder natively by @NickCao in #3360
[CI][Bugfix] Add request id to LTX2.3 CFG parallel test by @mglyn in #3934
[Perf] Trim Code2Wav CUDA Graph buckets for Qwen3-TTS single-GPU deploy by @R2-Y in #3932
[CI] Rectify L2~L4 Qwen Image Edit series tests by @fhfuih in #3901
[Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU by @nainiu258 in #2950
[CI][BugFix] Fix and Validate FP8 Z-Image quality gate by @david6666666 in #3929
[Test] Add scenarios for L5 reliability test by @zhumingjue138 in #3729
[Blackwell][1/N] Add SageAttention3 diffusion backend on blackwell(GB200/B200/RTX5090/PRO6000/DGX Spark available) by @david6666666 in #3015
[bugfix, rl] Fix sleep do not release full memory in custom pipeline by @knlnguyen1802 in #3818
Fix Qwen3-omni accuracy degradation from deepstack inputs under torch.compile by @andakai in #3885
[Bugfix] Fix Triton SnakeBeta kernel for bf16/fp16 inputs by @wuli666 in #3472
[XPU] Add CosyVoice3Model support on Intel XPU by @Liangyx2 in #2325
[Docs] Add recipe for Helios by @JasonJ2021 in #3114
[ci][nightly] Voxcpm2 performance benchmark by @Shirley125 in #3864
[BugFix] Fix prefix-caching issue by @amy-why-3459 in #3726
[bugfix] Solve Nightly / CI failed - tests/e2e/online_serving/test_bagel_expansion.py #3918 by @natureofnature in #3936
[BugFix] Avoid Voxtral TTS loading error msg by @y123456y78 in #3951
[Bugfix/Feature] Remove Hardcoded Flash Attention in Bagel & Support GQA in SDPA Backend by @alex-jw-brooks in #3728
[feat] Support prompt embedding caching for diffusion model by @knlnguyen1802 in #2962
[Feat]Support voice clone for omnivoice in online serving & add seed parameter for reproducible by @sphinxkkkbc in #3668
[CI][Bugfix] Fix LTX audio-video warmup output typing by @david6666666 in #3964
[Bugfix] Fix IndexError in DistributedVaeExecutor when vae_patch_parallel_size < world_size by @QingZhou-YangHY in #3928
Temp skip TEST - Entrypoint Test with H100 by @congw729 in #3989
[Perf] Qwen3-Omni performance optimization by @amy-why-3459 in #3878
[ROCm] Enable AITER backend with ring attention by @avjves in #3511
[Feat] Support MagCache by @RuixiangMa in #1287
[Perf][Qwen3-TTS] Restore Code2Wav cross-request batching (RFC #3163 P0) by @ischencheng in #3322
[Bugfix][Model] Qwen3-TTS: don't collapse 2D ref_code list when estimating prompt length by @nperraud in #3940
[minor, fix] Allow passing class interface as custom pipeline argument by @knlnguyen1802 in #2973
[Feat] support cache-dit for SenseNova-U1 by @fywc in #3906
[CI][XPU]Fix sage_attn hard-code import for cuda by @xuechendi in #3994
[Diffusion] Support USP and VAE patch parallel for HunyuanVideo 1.5 by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/3979
[HunyuanImage][Perf] adapt to deploy config changes by @Bounty-hunter in https://github.com/vllm-project/vllm-omni/pull/3996
[Refactor][Qwen3-TTS] Extract reusable prompt-embeds builder and make tts_pad_embed a persistent buffer by @vklimkov-nvidia in https://github.com/vllm-project/vllm-omni/pull/3992
docs: update WeChat QR code by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/4003
[Config Refactor] Migrate Ming-flash-omni-2.0 Image-Gen deploy configs by @yuanheng-zhao in https://github.com/vllm-project/vllm-omni/pull/3975
[Bugfix][Tests] Remove unnecessary device map in tests init by @wuhang2014 in https://github.com/vllm-project/vllm-omni/pull/3958
[CI/Bugfix] Async Request ID Aliasing by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/3953
[CI] Temporarily skip failing Bagel connector tests by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/4005
[Bugfix] Fix DiffusionWorker crash on SIGINT: ensure NCCL/ZMQ cleanup on shutdown by @wuhang2014 in https://github.com/vllm-project/vllm-omni/pull/3872
[Recipe] add mistralai voxtral tts recipe by @Dmaner in https://github.com/vllm-project/vllm-omni/pull/3498
Fix hunyuan resolve stop token ids by @BLANKETusers in https://github.com/vllm-project/vllm-omni/pull/3896
[Refactor] Unify _talker_mtp_forward across GPU and NPU model runners by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/3476
[BugFix]Qwen-Image performance regression by using omni RMSNorm(RMSNorm backend) by @NumberWan in https://github.com/vllm-project/vllm-omni/pull/3933
[Feat]audio streaming input for async chunk by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/3614
[model, omni] feat: Qwen3-Omni Thinker LoRA for RL training by @qinganrice in https://github.com/vllm-project/vllm-omni/pull/3915
[Feature] Add precomputed custom voices and Qwen3-TTS ref-context cache by @Sy0307 in https://github.com/vllm-project/vllm-omni/pull/3492
[Rebase] Rebase to vllm releases/v0.22.0 by @tzhouam in https://github.com/vllm-project/vllm-omni/pull/3891
[Bugfix] Fix FLUX W4A16/AutoRound quant_config propagation by @yiliu30 in https://github.com/vllm-project/vllm-omni/pull/3587
[Feat]upgrade vllm version [skip-ci] by @lengrongfu in https://github.com/vllm-project/vllm-omni/pull/4022
[skip ci][Recipe] OpenBMB/VoxCPM2 by @wjinxu in https://github.com/vllm-project/vllm-omni/pull/3850
[Entrypoint] Add realtime OpenPI robot serving API by @TKONIY in https://github.com/vllm-project/vllm-omni/pull/3673
[Feat]Support Nonverbal Tags in OmniVoice by @sphinxkkkbc in https://github.com/vllm-project/vllm-omni/pull/3968
[New Model] Add Lance (ByteDance) by @lishunyang12 in https://github.com/vllm-project/vllm-omni/pull/3710
[Perf] Deduplicate AR prefix cache hidden-state CPU staging by @TaffyOfficial in https://github.com/vllm-project/vllm-omni/pull/3734
[Metrics] Add audio SLOs + cross-stage transfer families + per-(stage, replica) wrap for upstream vllm:* by @LHXuuu in https://github.com/vllm-project/vllm-omni/pull/3576
[BugFix] Fix two stop reason for multimodal output by @QiuMike in https://github.com/vllm-project/vllm-omni/pull/3374
[Perf][TTS] Bounded-K active-stream window for Stage 1 (RFC #3535) by @linyueqian in https://github.com/vllm-project/vllm-omni/pull/3592
[ROCm] [CI] Bugfix Existing CI cases by @tjtanaa in https://github.com/vllm-project/vllm-omni/pull/3946
[Model]Support MiniCPM-o 4.5 by @tc-mb in https://github.com/vllm-project/vllm-omni/pull/3642
Add Cosmos3 model support by @MaciejBalaNV in https://github.com/vllm-project/vllm-omni/pull/3454
[XPU][Rebase v0.22] Fix for 0.22 rebase by @xuechendi in https://github.com/vllm-project/vllm-omni/pull/4059
[Perf][Bagel] Avoid per-step device syncs in Bagel img2img by @natureofnature in https://github.com/vllm-project/vllm-omni/pull/3987
add MiniCPM-o 4.5 recipe under recipes/OpenBMB by @tc-mb in https://github.com/vllm-project/vllm-omni/pull/4067
[TTS][Model] support MOSS-TTS series by @zhangj1an in https://github.com/vllm-project/vllm-omni/pull/3420
[Bugfix] Fix SD3 T5 truncation check device mismatch on long prompts by @bkdoeng in https://github.com/vllm-project/vllm-omni/pull/3949
Support VAE parallel for Bagel by @lsyyysky in https://github.com/vllm-project/vllm-omni/pull/3982
[Core] Integrate TrackingArgumentParser by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/3369
[Bugfix] fix qwen3-omni performance regression by @R2-Y in https://github.com/vllm-project/vllm-omni/pull/3575
[BugFix]Qwen-Image performance regression by using torch RMSNorm(RMSNorm backend) by @NumberWan in https://github.com/vllm-project/vllm-omni/pull/4074
[NPU] [Perf] Adjust flash_attn mask shape for hunyuanvideo1.5 on npu by @vasede in https://github.com/vllm-project/vllm-omni/pull/3178
[Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving by @TKONIY in https://github.com/vllm-project/vllm-omni/pull/2162
[Bugfix] Pass media paths to use_mixed_modalities in example script by @NickCao in https://github.com/vllm-project/vllm-omni/pull/3355
[Refactor] Migrate dynin_omni to pipeline registry, drop legacy stage… by @AbelSara in https://github.com/vllm-project/vllm-omni/pull/4078
Add Cosmos3 sound generation by @MaciejBalaNV in https://github.com/vllm-project/vllm-omni/pull/4073
[ci] add Voxcpm2 accuracy tests by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4055
[BugFix] Fix the issue of dataset names not being resolved by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4094
[bugfix] fix streaming input issue after rebase 0.22.0 by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4085
[CI][Accuracy] Add HunyuanImage3 pixel accuracy test and nightly CI by @BLANKETusers in https://github.com/vllm-project/vllm-omni/pull/3790
[Test] Add prefix caching + audio output regression test (#3510) by @oglok in https://github.com/vllm-project/vllm-omni/pull/3604
[Refactor] Refactor HunyuanImage3 SigLIP2 ViT to vLLM layers by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/3297
[ci] add merge/ready ci for audio realtime api by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4069
Update qwen3_tts_code2wav.py by @tanhaoan333 in https://github.com/vllm-project/vllm-omni/pull/4075
[Perf][VoxCPM2] Optimize VoxCPM2 high-concurrency decode throughput. by @Sy0307 in https://github.com/vllm-project/vllm-omni/pull/3882
[CI] Remove omni mark for MOSS-TTS and temporarily skipped by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/4097
[BugFix] Fix the issue of vllm failing to start. by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4105
Add Cosmos3 action modality by @bastefaniak in https://github.com/vllm-project/vllm-omni/pull/4102
[Perf][Qwen3-Omni]Optimize TTFP using initial_codec_chunk_frames by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4054
[CI] Skip online moss test temp by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/4122
[bugfix]qwen3tts code2wav by @tanhaoan333 in https://github.com/vllm-project/vllm-omni/pull/4123
[NPU][BugFix] Upgrade parts of ModelRunner to v0.22.0 by @tanhaoan333 in https://github.com/vllm-project/vllm-omni/pull/4130
[Bugfix] harden diffusion model prefetch against transformers v5 shard-resolution race by @tzhouam in https://github.com/vllm-project/vllm-omni/pull/4076
[Test] Add L4 diffusion feature test for GLM-Image by @herotai214 in https://github.com/vllm-project/vllm-omni/pull/3451
[HunyuanImage3][CI] fix ci by @Bounty-hunter in https://github.com/vllm-project/vllm-omni/pull/4134
fix: cosyvoice3 batch>1 inference by @yuekaizhang in https://github.com/vllm-project/vllm-omni/pull/3910
[BugFix] Cast bf16 video frames to float32 before .numpy() in /v1/videos by @BruceLoveDecimal in https://github.com/vllm-project/vllm-omni/pull/4114
Add dependency FlagEmbedding by @congw729 in https://github.com/vllm-project/vllm-omni/pull/3980
[CI] Update Bagel Pixels by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/4081
Benchmark data statistics for each stage of omni models by @ZacheryAU in https://github.com/vllm-project/vllm-omni/pull/3628
[Refactor] [Qwen3-Omni]Modify the thinker's sampling parameters to align with transformers. by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4137
[HunyuanImage3.0][Performance][Optimazation]Adjust perf config by @Bounty-hunter in https://github.com/vllm-project/vllm-omni/pull/4149
[BugFix] Fix incorrect GPU device mapping in multi-replica stages by @ZhengWG in https://github.com/vllm-project/vllm-omni/pull/4132
[XPU][S2V] Optimize Wan2.2 S2V: RoPE refactor + cache_dit enabling by @xuechendi in https://github.com/vllm-project/vllm-omni/pull/4062
[BugFix] Support ModelOpt FP8 under batched diffusion serving by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/4155
[Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22 by @yuanheng-zhao in https://github.com/vllm-project/vllm-omni/pull/4080
[Feature] gen-only FP8 Quantization Support for SenseNova-U1 by @leohuang257 in https://github.com/vllm-project/vllm-omni/pull/3943
[skip ci]cleanup(assets): remove dead vllm_omni/assets/video.py by @Shylin26 in https://github.com/vllm-project/vllm-omni/pull/4120
[Bugfix] Update the value of --max-seed-tts-mean-wer in the accuracy test. by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4160
[Bugfix]Fix HunyuanImage3 conditional image and prompt kwargs alignment by @Yaegaki1Erika in https://github.com/vllm-project/vllm-omni/pull/4145
[Perf] Enable fused RMSNorm for HunyuanImage3 by @Bill845514379 in https://github.com/vllm-project/vllm-omni/pull/3959
[Fix] Update Qwen3 Omni multi-replica perf baselines by @fake0fan in https://github.com/vllm-project/vllm-omni/pull/4175
[CI][bugfix]: Improve Qwen Image accuracy test with diffusers attn alignment by @fhfuih in https://github.com/vllm-project/vllm-omni/pull/4143
[bugfix] fix realtime ci timeout error by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4187
[CI] Revert "[Feature] gen-only FP8 Quantization Support for SenseNova-U1" by @Gaohan123 in https://github.com/vllm-project/vllm-omni/pull/4196

New Contributors

@yuchenjiangyj made their first contribution in #3239
@Phi-C made their first contribution in #3436
@chzhang2021 made their first contribution in #3130
@Wallbreazzz made their first contribution in #3453
@baonudesifeizhai made their first contribution in #2913
@saadaltohamy made their first contribution in #3320
@weizhoublue made their first contribution in #3568
@hxhhhlalala made their first contribution in #3140
@zengchuang-hw made their first contribution in #3551
@keeper-jie made their first contribution in #3380
@MmMaiIIi made their first contribution in #3711
@lulugoodcoder made their first contribution in #3608
@bitborne made their first contribution in #3720
@yixiaoer made their first contribution in #3684
@ksiyuan made their first contribution in #3598
@Galleons2029 made their first contribution in #3686
@qidaye made their first contribution in #3541
@ANHDY made their first contribution in #2572
@univa-HARRY made their first contribution in #3336
@BeatSeat made their first contribution in #3141
@nagisa-kunhah made their first contribution in #3829
@BLANKETusers made their first contribution in #3795
@yuekaizhang made their first contribution in #3762
@mglyn made their first contribution in #3854
@avjves made their first contribution in #3419
@blondeCS made their first contribution in #3584
@JLiu4Coding made their first contribution in #3923
@nainiu258 made their first contribution in #2950
@andakai made their first contribution in #3885
@wuli666 made their first contribution in #3472
@Liangyx2 made their first contribution in #2325
@QingZhou-YangHY made their first contribution in #3928
@ischencheng made their first contribution in #3322
@nperraud made their first contribution in #3940
@vklimkov-nvidia made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3992
@Dmaner made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3498
@qinganrice made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3915
@wjinxu made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3850
@LHXuuu made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3576
@tc-mb made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3642
@MaciejBalaNV made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3454
@bkdoeng made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3949
@vasede made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3178
@AbelSara made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4078
@tanhaoan333 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4075
@bastefaniak made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4102
@herotai214 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3451
@BruceLoveDecimal made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4114
@ZacheryAU made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3628
@Shylin26 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4120
@Yaegaki1Erika made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4145
@Bill845514379 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3959

Full Changelog: v0.20.0...v0.22.0