Highlights
This release features 339 commits from 124 contributors, including 52 new contributors.
vLLM-Omni v0.22.0 is a major world-model, diffusion, and omni-serving release aligned with the vLLM 0.22 release line. It adds Cosmos 3 and DreamZero world-model support, broadens speech and multimodal model coverage, and improves production serving across multistage runtime, OpenPI robot serving, diffusion acceleration, quantization, and hardware backends.
Key Improvements
- World model support, with Cosmos3 model support, sound generation, action modality, and DreamZero integration with CFG parallel plus OpenPI serving. (#3454, #4073, #4102, #2162, #3673)
- Aligned with the vLLM 0.22 release line, including the vLLM 0.21 and 0.22 rebases, dependency compatibility updates, release image builds, and PyPI upload support. (#3530, #3891, #4022, #3428, #3667)
- Expanded speech and omni model coverage, adding MiniCPM-o 4.5, Lance, MOSS-TTS, GLM-TTS, Higgs Audio v2, Covo-Audio-Chat, and new recipes/deploy profiles for several speech and omni models. (#3642, #4067, #3710, #3420, #3141, #3762, #2293)
- Improved diffusion acceleration and parallel execution, including Wan2.2 pipeline parallelism, HunyuanImage3 VAE parallelism, HunyuanVideo 1.5 USP/VAE patch parallel, LTX-2.3 CFG parallel, step-wise LoRA, MagCache, CacheDiT coverage, and prompt-embedding caching. (#2322, #3091, #3979, #3905, #3639, #1287, #3265, #3906, #2962)
- Made audio and TTS serving more production-ready, with Qwen3-TTS, Qwen3-Omni, VoxCPM2, Fish Speech S2 Pro, OmniVoice, async audio input, custom voices, ref-context cache, and high-concurrency improvements. (#3662, #3492, #3322, #3592, #4054, #3882, #3773, #3336, #3614)
- Expanded quantization and hardware coverage, including Blackwell diffusion attention backends, W4A16, FP8/INT8, MXFP4, MXFP8, ModelOpt mixed FP8/NVFP4, batched ModelOpt FP8, ROCm AITER, Intel XPU, and Ascend NPU updates. (#3353, #3059, #3700, #3902, #3578, #3570, #3782, #3943, #4155, #3079, #3015, #3419, #3511, #2325)
Core Architecture & Runtime
- Integrated
OmniCoordinatorinto the stage engine pipeline and continued the communication-layer refactor across non-async omni paths, improving multistage orchestration, request routing, and model-runner reuse. (#3569, #2677, #3719, #3476) - Hardened stage and diffusion lifecycle behavior with worker dead detection, cleanup fixes, safer subprocess shutdown, SIGINT cleanup for NCCL/ZMQ resources, master-port selection fixes, and diffusion prefetch protection for newer transformers shard-resolution behavior. (#3214, #3494, #3751, #3872, #3803, #4076)
- Improved request and scheduler correctness through unified diffusion request identity, prefix-cache and token-history fixes, streaming finish reasons, Qwen3-Omni sampling alignment with transformers, and deterministic media-path handling in mixed-modality examples. (#3744, #3665, #3681, #3374, #4137, #3355)
- Added
TrackingArgumentParserand refreshed configuration behavior around recursive engine-arg merging, deploy-config field allowlisting, concrete entrypoint typing, and single-stage/multistage test coverage. (#3369, #3009, #3483, #3139)
Model Support
- Added Cosmos3 support across model execution, recipes, tests, and accuracy coverage, including base model support, sound generation, and action modality support. (#3454, #4073, #4102)
- Added DreamZero world-model integration with CFG parallel, OpenPI serving, deployment configs, online examples, OpenPI client helpers, and source-parity tests. (#2162, #3673)
- Added or expanded omni and multimodal model support for MiniCPM-o 4.5, Lance, MOSS-TTS, GLM-TTS, Higgs Audio v2, Covo-Audio-Chat, HiDream-I1-Full, Ming-flash-omni-2.0 image generation, SenseNova U1, and Qwen3-Omni Thinker LoRA for RL training. (#3642, #4067, #3710, #3420, #3141, #3762, #2293, #2572, #2875, #3319, #3915)
- Improved model-family behavior across Qwen-Image, Qwen-Image-Edit, BAGEL, HunyuanImage3, HunyuanVideo 1.5, FLUX.2-dev, LTX-2/LTX-2.3, DreamID-Omni, Helios, Ovis image, MiMo-Audio, and Ming-flash-omni. (#3608, #3219, #3933, #3728, #3857, #3979, #3244, #3621, #3905, #3265, #3470, #3876, #3686, #4080)
Audio, Speech & Omni Production Optimization
- Optimized Qwen3-TTS for high-concurrency serving with precomputed custom voices, ref-context cache, restored cross-request Code2Wav batching, persistent prompt-embedding helpers, reduced CUDA Graph buckets, and compatibility fixes for newer transformers versions. (#3662, #3492, #3322, #3992, #3932, #3880)
- Improved Qwen3-Omni performance and correctness with TTFP optimization, sampling alignment with transformers, prefix-cache correctness, long-output correctness tests, torch.compile accuracy fixes, and streaming-input fixes after the v0.22 rebase. (#4054, #4137, #3665, #3539, #3885, #4085)
- Improved Fish Speech S2 Pro, VoxCPM2, OmniVoice, GLM-TTS, Higgs Audio v2, and MOSS-TTS serving paths through high-concurrency decode work, Triton/CUDA Graph acceleration, voice clone serving, reproducible seeds, nonverbal tags, and broader offline/online examples. (#3773, #3882, #3336, #3668, #3968, #3141, #3762, #3420)
- Added audio SLO metrics, cross-stage transfer metric families, audio streaming continuity metrics, and per-stage/per-replica metric wrapping for upstream
vllm:*metrics. (#3576, #3618)
Diffusion, Image & Video Generation
- Added and expanded diffusion parallel execution with Wan2.2 pipeline parallelism, HunyuanImage3 VAE parallelism, HunyuanVideo 1.5 USP plus VAE patch parallel, LTX-2.3 CFG parallel, BAGEL VAE parallel, and HunyuanVideo/HunyuanImage3 NPU performance work. (#2322, #3091, #3979, #3905, #3982, #3178)
- Expanded diffusion acceleration with CacheDiT for Helios, DreamID-Omni, SenseNova U1, and LTX-2; prompt-embedding caching; MagCache; step-wise LoRA; and CacheDiT-related correctness fixes. (#3470, #3265, #3906, #3621, #2962, #1287, #3639, #3219)
- Improved image and video generation correctness and serving behavior across HunyuanImage3, Qwen-Image, Qwen-Image-Edit, Flux2 Klein, GLM-Image, SD3, SenseNova U1, and
/v1/videos, including long-prompt/device fixes and safer bf16 video frame conversion before NumPy output. (#4145, #3933, #4074, #3711, #3717, #3451, #3949, #4114) - Improved diffusion serving and benchmark behavior with endpoint routing for image edits, benchmark endpoint naming, output comparison tooling, performance quality gates, and stage-level benchmark statistics. (#3693, #3137, #3175, #3851, #3628)
Quantization & Memory Efficiency
- Added broader diffusion quantization support, including Wan2.2 W4A16, GLM-Image W4A16, LTX-2 online FP8/INT8, DreamID-Omni online FP8/INT8, NPU MXFP4 online/offline quantization, XPU MXFP8, ModelOpt mixed FP8/NVFP4 and batched ModelOpt FP8 serving support. (#3353, #3059, #3700, #3902, #3578, #3782, #3570, #3943, #4155)
- Added quantization quality and trajectory comparison tooling for diffusion outputs, improved quantization benchmark handling for omni outputs, and expanded quality-gate coverage for FP8 Z-Image and related diffusion tests. (#3175, #3653, #3929)
- Improved memory and cache behavior through Qwen-Image text encoder cleanup, prompt-embedding cache support, custom pipeline sleep memory release fixes, global CUDA graph pool reuse, BAGEL per-step sync removal, and AR prefix hidden-state CPU staging deduplication. (#3608, #2962, #3818, #3361, #3987, #3734)
RL, Serving & Integrations
- Added DreamZero/OpenPI serving and a realtime OpenPI robot serving API, including online DreamZero examples, OpenPI client helpers, connection tests, and serving tests. (#2162, #3673)
- Added Qwen3-Omni Thinker LoRA support for RL training and improved custom pipeline argument handling, sleep/wakeup memory behavior, and multistage deployment coverage. (#3915, #2973, #3818, #3610)
- Improved OpenAI-compatible serving behavior for image edits, speech generation, realtime audio, chat/multistage generation, invalid parameter handling, stream finish reasons, and frontend audio engine errors. (#3693, #2849, #3614, #3374, #3652, #3316)
- Added Yuanrong TransferEngine connector support for NPU and improved connector/runtime infrastructure around chunk transfer, memory pools, local-rank handling, distributed KV flow, and multi-replica GPU device mapping. (#3180, #3569, #3740, #4132)
Platforms, Distributed Execution & Hardware Coverage
- Expanded Blackwell diffusion support with CUDNN attention, FlashInfer attention auto-routing, and SageAttention3 backend support for GB200/B200/RTX 5090/PRO 6000/DGX Spark class systems. (#3079, #3015)
- Improved ROCm coverage with AITER GroupNorm, AITER backend support for ring attention, and v0.22-era ROCm CI fixes. (#3419, #3511, #3946)
- Improved Intel XPU coverage with CosyVoice3 support, MXFP8 support through the vLLM main-repo method, diffusion attention defaults, Docker/CI updates, v0.22 rebase fixes, and Wan2.2 S2V RoPE/cache_dit optimization. (#2325, #3782, #3525, #3675, #4059, #4062)
- Improved Ascend NPU coverage with Wan2.2 MXFP4 quantization, HunyuanImage3 FA-FP8, GLM-Image stage configs and HCCL runtime environment fixes, Yuanrong connector support, sampler/runtime fixes, and v0.22 ModelRunner updates. (#3578, #3540, #3235, #3180, #3517, #4130)
CI, Benchmarks & Documentation
- Unified the release pipeline around a
NIGHTLY=1option, added x86_64/aarch64 image builds, enabled twine upload to PyPI, refreshed Docker bases, and updated CUDA/ROCm/XPU installation docs for the current release line. (#3428, #3667, #3859, #4059) - Added or improved reliability, invalid-parameter, nightly parity, accuracy, and performance coverage for Cosmos3, DreamZero, HunyuanImage3, HunyuanVideo 1.5, GLM-Image, BAGEL, VoxCPM2, Qwen3-Omni, Wan2.2, MOSS-TTS, and multistage deployment. (#3454, #2162, #3790, #3852, #3451, #2175, #4055, #3729, #4097, #3610)
- Improved benchmarking and observability infrastructure with audio SLOs, cross-stage transfer metrics, modality metrics, Prometheus/stat-logger tests, audio-streaming continuity metrics, diffusion benchmark endpoint routing, optional baseline assertions, and repo-wide benchmark documentation. (#3576, #3618, #3693, #3695, #1939)
- Refreshed docs and recipes for quantization, diffusion performance, CosyVoice3 online serving, GLM-Image, Helios, Qwen Image Edit, VACE, MiniCPM-o 4.5, Lance, MOSS-TTS, VoxCPM2, Cosmos3, and CUDA image commands. (#3764, #3851, #3748, #2950, #3114, #3684, #3584, #4067, #3710, #3420, #3850, #3454, #3836)
What's Changed
- [chore] Update command to download dataset from huggingface-cli to hf by @Gaohan123 in #3403
- [Refactor] Replace and ban a few torch.cuda functions in favor of torch.accelerator replacements. by @NickCao in #3365
- [Clean] Remove multi-replica Bagel CI and related docs/configs by @fake0fan in #3407
- Update CODEOWNERS feature reviewers by @david6666666 in #3378
- [Test] Unify L2/L3 test layout, Buildkite steps, and test helpers by @yenuo26 in #2556
- [Hardware] Extend diffusion engine plugin extensibility for out-of-tree hardware backends by @yuchenjiangyj in #3239
- [Feat] support hsdp for Bagel by @lsyyysky in #3150
- [Bugfix] Fix the issue where the seed parameter does not take effect when using the OpenAI Python client by @Phi-C in #3436
- [Bugfix] Fix Dtype Crashes in SD3 by @alex-jw-brooks in #2526
- [Feature][Hunyuan image 3.0] AR + DIT with kv reuse. by @Bounty-hunter in #3346
- [Test][HunyuanImage3] Add e2e offline I2T smoke test by @TaffyOfficial in #3332
- [BugFix]Fix default stage config path in voxcpm2 by @sphinxkkkbc in #3447
- [Feat] Add Sequence Parallelism (USP) support for HunyuanVideo 1.5 transformer by @daixinning in #2444
- [Feature] online HunyuanImage-3.0 IT2I (image editing) support by @skf-1999 in #3410
- enhancement: extend to dmd2 to image generation + add flux, qwen image pipelines by @ayushag-nv in #2974
- [Refactor] Rename SupportsModuleOffload to SupportsComponentDiscovery by @NickCao in #3354
- Add Qwen3 TTS Model recipe by @chzhang2021 in #3130
- [Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs by @linyueqian in #3405
- [Bugfix] Qwen-Image use teachche serve will crash by @lengrongfu in #3450
- [Perf] Optimize VoxCPM2 first-request latency via startup warmup by @Dan250124 in #3424
- [Bugfix] fix OmniGen2 offload and dtype mismatch by @RuixiangMa in #2560
- [Feature] Add FP8 quantization for Voxtral TTS by @akshatvishu in #3036
- Fix NPU code predictor device mismatch in concurrent mode by @Wallbreazzz in #3453
- [Test] Restore tts mark and omni_runner_function fixture for Voxtral TTS by @linyueqian in #3462
- [CI] Update merge condition to skip L3 merges during weekly test and update doc by @zhumingjue138 in #3197
- [CI] Refine nightly pytest command in Omni · Function Test with H100 to avoid duplicate testing. by @yenuo26 in #3459
- (Phase 1)Add ModelOpt FP8 auto-detect support for diffusion checkpoints #2709 by @baonudesifeizhai in #2913
- [CI][Nightly] Shard nightly Diffusion X2I H100 lanes and centralize shard definitions by @wuhang2014 in #3455
- [CI] Remove VLLM_TEST_CLEAN_GPU_MEMORY to avoid environment variable pollution that causes unnecessary GPU detection, thereby slowing down test case execution. by @yenuo26 in #3446
- [Diffusion][Attention] Support per-role attention backend via CLI by @gcanlin in #2681
- [Feature] hunyuanimage support flash attn by @Bounty-hunter in #2981
- [Perf] Fix Qwen3-TTS latency regression by @Sy0307 in #3485
- [ROCm] [CI] Add the same skip ci logic as CUDA CI by @tjtanaa in #3482
- [Docs] Refactor the attention backend docs/skill by @gcanlin in #3475
- [Performance] Improve MiMo-Audio tokenizer decoding performance by @qibaoyuan in #2183
- [BugFix] Rename attention_config to diffusion_attention_config by @gcanlin in #3489
- [Bug][Hunyuanimage 3.0] fix different AR encode behavior between online and offline by @Bounty-hunter in #3500
- [Misc] Clean logs for image gen task by @wuhang2014 in #3414
- [CI] skip failing diffusion and accuracy cases (#3432, #3256, #3257, #3488) by @yenuo26 in #3507
- [New Model]: Add sensenova u1 support by @princepride in #3319
- [Config] Add HunyuanImage3 deploy configs by @Fishermanykx in #3172
- [Fix] Fix RMSNorm inductor KeyError under HSDP + torch.compile by @LJH-LBJ in #3460
- [Perf] Remove dead audio_tower and visual from Qwen3-Omni talker stage by @NickCao in #3296
- [bugfix][ci] avoid Whisper transcript deduplication in realtime audio test by @Shirley125 in #3417
- [Chore] explicit .float() conversion in Helios's optimized_scale function by @RuixiangMa in #3529
- [CI][Bugfix] Improve e2e latency logging, update response classes to include detailed latency documentation and add startup time logging by @yenuo26 in #3246
- [Recipes]update Wan2.2-I2V gpu part by @bjf-frz in #3271
- [BugFix] Modify the splicing method of streaming audio output. by @amy-why-3459 in #3438
- [Bugfix] Align the AR and DiT prompt formatting across both online and offline modes. by @Bounty-hunter in #3516
- [FIX] Ensure
extra_paramsare correctly merged into sampling params in_create_diffusion_speech()by @saadaltohamy in #3320 - [Nightly CI] Remove TP case by @NumberWan in #3534
- [Refactor] msgspec standardisation for data entry key names and improved type checks by @divyanshsinghvi in #3149
- [New Model] Add support for tencent/Covo-Audio-Chat by @Dnoob in #2293
- [bugfix, rl] Fix race condition bug on async running for diffusion model by @knlnguyen1802 in #3379
- [CI] update daily omni min accuracy by @R2-Y in #3536
- [Perf] Remove dead audio_tower and visual from Qwen2.5-Omni talker stage by @NickCao in #3425
- [Bugfix] Fix the issue where the qwen3-omni model long-term stability test sometimes gets stuck without sending requests. by @zhumingjue138 in #3468
- [Bugfix] Fix omni processing test for non-multimodal talker stage by @NickCao in #3559
- Bump diffusers minimum version to >=0.38.0 by @oglok in #3349
- support online FP8 quantization for FA on NPU #2236 by @lyj-jjj in #2640
- [CI][Test] Add NPU nightly tests by @gcanlin in #3480
- [CI][Bugfix] skip fp8 Z-Image quality gate (#3531) and add torchdiffeq dev extra by @yenuo26 in #3563
- [Bugfix, rl] Diffusion worker SIGKILL under Ray actor (exitcode -9) by @knlnguyen1802 in #3533
- Fix: NPU AR model runner prefix cache key flattening by @weizhoublue in #3568
- [NPU][Quant] Add W8A8 MXFP8 online/offline quantization support for Wan2.2 T2V / I2V / TI2V inference on Ascend NPU by @hxhhhlalala in #3140
- [skip ci][Tests] Splitting Qwen3-omni's performance test cases by @amy-why-3459 in #3501
- [ROCm] Bugfix wan22 by @tjtanaa in #3463
- [Bugfix] Add bot_task option of think_recaption for hunyuanimage3 it2i by @zengchuang-hw in #3551
- [Feat][Config] Support additional_config for diffusion worker by @Fishermanykx in #3020
- [Bugfix][HunyuanImage3.0] Fix KV reuse compatibility in SP scenarios by @Bounty-hunter in #3546
- [Model] Add TP-aware MistralEncoder for FLUX.2-dev TP by @vraiti in #2465
- [BugFix] Refresh TeaCache when num_inference_steps=None by @alex-jw-brooks in #2240
- [Test] Add stability tests for HunyuanImage-3-Instruct by @zhumingjue138 in #3504
- [Bugfix]: Fix online serving failure when using deploy config by @Fishermanykx in #3537
- [Entrypoint][Refactor] Make field type hint more concrete by @wuhang2014 in #3139
- [CI] Harden Qwen3-TTS perf nightly: enable Base voice_clone, add c=64/128, 2-GPU split by @linyueqian in #3491
- [Feature] HunyuanImage-3.0 IT2I: multi-image input + prompt API cleanup by @TaffyOfficial in #3444
- update v0.20.0 readme by @hsliuustc0106 in #3594
- [Bugfix]Allow HunyuanImage3 AR sampler batching by @bjf-frz in #3590
- [BugFix] fix shm connector by @Bounty-hunter in #3583
- [CI] Add Qwen3-TTS tests for ready tag by @gcanlin in #3600
- Update WeChat group QR code by @david6666666 in #3624
- [BugFix] fix(omni): isolate diffusion KV-cache dtype from vLLM --kv-cache-dtype #3585 by @lyj-jjj in #3596
- Update streaming_speech_client.py to solve Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice voice problem by @keeper-jie in #3380
- [CI] add cuda marker to Diffusion X2V function pytest by @yenuo26 in #3625
- [Bugfix] UnspecifiedOmniPlatform.get_device_count returns 0 by @princepride in #3636
- [2/5] [core]refactor communication layer: PR 2 of 5 Qwen3 Omni non async by @natureofnature in #2677
- [Bugfix]Fix multimodal cache routing for AR replicas by @bjf-frz in #3605
- [BugFix] Fix the issue of thinker requests being preempted, causing shape mismatch. by @amy-why-3459 in #3147
- [Bugfix] fix compatibility of _hunyuan_image3_unpack_packed_topk between vllm / vllm ascend by @Fishermanykx in #3640
- [bugfix] Fix diffusers backend input bug after #2913 by @fhfuih in #3644
- [BugFix] fix ci by @amy-why-3459 in #3650
- [CI] Replace c=128 perf cell with c=16; loosen new-cell baselines by @linyueqian in #3637
- [Rebase] Rebase to vllm v0.21.0 by @tzhouam in #3530
- [BugFix] Finish async_chunk requests without pad-token injection by @NickCao in #3613
- [Hunyuanimage 3.0] hunyuan accuracy test by @Bounty-hunter in #3655
- [CI][Accuracy] Add Qwen-Image-2512 Qwen-Image-Edit-2511 pixel accuracy tests by @david6666666 in #3502
- [Bugfix] Support diffusion worker dead detect when use inline engine by @wuhang2014 in #3214
- [Bugfix]update process name for dit stage by @zengchuang-hw in #3602
- [Feat] Add helios support cache dit by @lengrongfu in #3470
- [ROCm] [CI] [Bugfix] Upgrade vllm version to v0.21.0 and ROCm 7.2.2 by @tjtanaa in #3659
- [Refactor] Migrate and clean up TTS configs: CosyVoice3, OmniVoice, VoxCPM by @yuanheng-zhao in #3338
- [Config Refactor] Support Recursive Merging for Engine Args by @alex-jw-brooks in #3009
- [CI/Build] Unify release pipeline with NIGHTLY=1 option, add x86_64/aarch64 image builds by @khluu in #3428
- [CI/Build] Enable twine upload to PyPI by @khluu in #3667
- [Bugfix] Adapt LTX-2 connector arg with diffusers 0.38.0 by @yuanheng-zhao in #3661
- [Frontend]Handle audio generate engine errors consistently by @reidliu41 in #3316
- [BugFix][HunyuanImage3] Set MRoPE dynamic_arg_dims so graph mode can compile by @TaffyOfficial in #3630
- Fix output finish reason issue for audio chunk in stream mode by @QiuMike in #2849
- Fix reasoning_parser crash: reconstruct StructuredOutputsConfig from dict by @QiuMike in #2845
- [Doc] Simplify template example subtitle by @hsliuustc0106 in #3669
- [Doc] Reorganize available recipes into a table by @hsliuustc0106 in #3671
- [SKILL]Add diffusion perf skill by @bjf-frz in #3461
- [TTS][Perf] Optimize Qwen3-TTS high-concurrency serving by @Sy0307 in #3662
- Fix diffusion engine cleanup lifecycle by @wuhang2014 in #3494
- [XPU] update dockerfile and CI to 0.21.0 by @xuechendi in #3675
- [Bugfix][TTS] Drop meaningless TTFT from speech-endpoint benchmarks by @linyueqian in #3674
- [Bugfix] fix diffusion quantization benchmarking for Omni outputs by @RuixiangMa in #3653
- [Bugfix] Fix SenseNova U1 broken import after SupportsModuleOffload by @nussejzz in #3691
- [BugFix][CI]Fixing occasional CI failures by @amy-why-3459 in #3623
- [HY-Imgae3.0] support hunyuan image3 dit fa-fp8 on npu by @lyj-jjj in #3540
- [Bugfix][Qwen3-Omni] Handle short Code2Wav chunk outputs by @Sy0307 in #3687
- [XPU] set flash_attn as default diffusion attn backend and fix k_len for cross_attn by @xuechendi in #3525
- [Feature] Add support for Pipeline Parallel and integrate it into Wan 2.2 by @hadipash in #2322
- Disable sampler kernel for XPU test by @pi314ever in #3718
- [Bugfix] Fix hunyuanimage3 dit quant storageshape mismatch error by @fan2956 in #3694
- [Refactor]Rename diffusion benchmark backend to endpoint by @bjf-frz in #3137
- [Bugfix] Reject empty prompts in Flux2 Klein diffusion pipeline by @MmMaiIIi in #3711
- Reject non-positive Flux2 Klein inference steps by @MmMaiIIi in #3717
- [large-scale-serving] Integrate OmniCoordinator into stage engine pipeline by @chickeyton in #3569
- [CI] invalid_param reliability suite and weekly http_invalid jobs by @yenuo26 in #3652
- [CI] improve Buildkite testcase statistics reports by @yenuo26 in #3543
- [Qwen-Image] Drop unused vision tower from text encoder by @lulugoodcoder in #3608
- [Cleanup] Remove unused build_base_engine_args after #1115 by @bitborne in #3720
- [Recipe] Qwen/Qwen-Image-Edit by @yixiaoer in #3684
- [BugFix] fix mult cli timeout with get kv by @Bounty-hunter in #3741
- [Quantization][tools] Add diffusion quantization output comparison tool by @david6666666 in #3175
- [CI] optional --assert-baseline and update perf JSON baselines by @yenuo26 in #3695
- [Feat] Enable VAE parallel in HunyuanImage3 by @Fishermanykx in #3091
- [Bugfix][TTS] Only populate voice_name for uploaded voices without inline ref_audio by @NickCao in #3523
- [XPU][CI] fix test_qwen2_5_omni_expansion.py::test_mix_to_audio by @xuechendi in #3761
- [Perf][VoxCPM2][Ming-Flash-Omni] Use global CUDA graph pool by @NickCao in #3361
- [Bench] Add audio-streaming continuity metric for TTS by @linyueqian in #3618
- [Bugfix] Treat kv_cache_dtype=auto as unset for ring attention by @RuixiangMa in #3622
- [NPU][Quant] Add W4A4 MXFP4 online & MXFP4 dual-scale online/offline quantization support for Wan2.2 T2V / I2V inference on Ascend NPU by @hxhhhlalala in #3578
- Yuanrong TransferEngine Connector for NPU by @yangsonglin13 in #3180
- [Doc][TTS] CosyVoice3 online docs + residual TTS yaml cleanup + remove VoxCPM v1 by @linyueqian in #3748
- [Test] add run_nightly_jobs.sh for local nightly pytest parity by @yenuo26 in #3670
- [Bugfix]Fix distributed stage0 multimodal cache routing by @bjf-frz in #3740
- [Perf] Optimize sampler D2H sync for HY-Image by @gcanlin in #3617
- [Docs] Complete quantization nav and online guide by @david6666666 in #3764
- [Diffusion] Support LoRA in step-wise execution by @SamitHuang in #3639
- [Bugfix] Fix qwen2_5_omni weight loading by @ksiyuan in #3598
- [Benchmark] Route i2i/ti2i to POST /v1/images/edits in diffusion_benchmark_serving by @NumberWan in #3693
- [AutoRound] Support WAN2.2 W4A16 quantization model by @lvliang-intel in #3353
- [Feat] Support online quantization (fp8/int8) for LTX-2 by @yuanheng-zhao in #3700
- Add new committers to governance page by @hsliuustc0106 in #3749
- [Bugfix] Fix MiMo-Audio voice instability: stochastic local_sampler + codec streaming context by @Galleons2029 in #3686
- Update WeChat group QR code by @david6666666 in #3806
- [Bugfix] Fix Hunyuan worker device context by @fake0fan in #3768
- (Phase 2)Add ModelOpt mixed FP8/NVFP4 support for image generation by @baonudesifeizhai in #3570
- Fix OmniDiffusionConfig master_port selection for parallel launches by @SamitHuang in #3803
- [Bugfix] Remove stale OmniStage import and type annotation by @qidaye in #3541
- [BugFix] Fix prefer_model_sampler token history in async scheduling by @zengchuang-hw in #3681
- [feature]: support Hidream-I1-Full model by @ANHDY in #2572
- [Bugfix] Align Offline and Online Inference by @skf-1999 in #3506
- [CI] Fix email bug & skip email distribution. by @congw729 in #3814
- [Bugfix] Revert MiMo-Audio local_sampler to greedy to fix text truncation under concurrent batching (followup to #3686) by @Galleons2029 in #3817
- [Bugfix] Set separate CFG flag in Helios for CacheDiT by @alex-jw-brooks in #3756
- [Recipe] Add Fish Speech S2 Pro 2-GPU deploy profile by @linyueqian in #3323
- [Perf] [OmniVoice] Triton kernel fusion + CUDA Graph acceleration by @univa-HARRY in #3336
- [Bugfix][CI] Run Whisper validation on CPU for single-GPU runners by @linyueqian in #3822
- [Feat] support cache-dit for DreamID-Omni by @fywc in #3265
- [BugFix] code2wav supports disabling CUDA graph. by @amy-why-3459 in #3732
- [Model] Add GLM-TTS text-to-speech model support by @BeatSeat in #3141
- [Bugfix] Fix LTX2 CacheDiT Integration by @alex-jw-brooks in #3621
- docs: fix CUDA pre-built image command by @akshatvishu in #3836
- [BugFix][NPU] Honor prefer_model_sampler in NPU AR runner by @gcanlin in #3517
- [Bugfix][Example][OmniVoice] Drop hardcoded "voice": "default" from speech_client.py by @nagisa-kunhah in #3829
- Add hunyuan online accuracy test by @BLANKETusers in #3795
- [CI] Increase timeout for Quantization Test in nightly build to 60 minutes by @zhumingjue138 in #3845
- [Bugfix] Fix Qwen3-TTS Stage 0 prefix-caching correctness by @linyueqian in #3665
- [Bugfix] fix when diffusion model not set sleeping_stages by @lengrongfu in #3023
- [Higgs-Audio] bosonai/higgs-audio-v2-generation-3B-base TTS model support by @yuekaizhang in #3762
- [UX] Rename default config to hunyuan_image_3_moe by @gcanlin in #3835
- [Test] Qwen-Image Perf Test with High Concurrency by @wtomin in #2822
- [BugFix]: CUDA
device-side assertfailures on single-stage BAGEL i2i requests by @NumberWan in #3680 - [CI] Add nightly-ci for multi-stage deployment by @ZhengWG in #3610
- [CI][Bugfix]Fix Wan2.2 I2V reference image upload by @bjf-frz in #3869
- [HunyuanImage][End2End Performance CI] Add hunyuan end2end test by @Bounty-hunter in #3849
- [BugFix] Fix LTX-2.3 audio latent padding for sequence parallelism by @mglyn in #3854
- Update CUDA Docker base image to vLLM v0.21.0 by @hsliuustc0106 in #3859
- [Docs] Strengthen diffusion perf optimization quality gate by @david6666666 in #3851
- [bugfix] fix default deploy config in hunyuan_image offline example by @zengchuang-hw in #3879
- [BugFix] Fix Qwen3-TTS Code2Wav compatibility with transformers >= 5.9.0 by @Dan250124 in #3880
- glm-image: fix(npu)per-stage runtime env for HCCL ports + GLM-Image NPU stage config by @lyj-jjj in #3235
- [Feat][HunyuanImage3] Stream AR text for IT2I image edits by @TaffyOfficial in #3723
- [Doc][Benchmark] Rewrite benchmarks/README.md as repo-wide index by @Dnoob in #1939
- [Bugfix] Fix Qwen-Image-Edit-2511 TeaCache zero_cond_t handling by @JasonJ2021 in #3219
- [Perf] Trim HunyuanVideo encoder padding tokens by @david6666666 in #3844
- [Feat] opt qwen image model load use ColumnParallelLinear replace ReplicatedLinear by @lengrongfu in #3875
- [Bugfix]Fix Hunyuan Image3 denoise flow alignment by @bjf-frz in #3857
- [ROCm] Add support for AITER GroupNorm by @avjves in #3419
- [Feature] support SP for FLUX.2-dev by @nuclearwu in #3244
- [Model] Add Ming-flash-omni-2.0 Image Generation (Diffusion) Stage by @ZhengWG in #2875
- [BugFix] Fix diffusion parallel_config YAML override and add deploy config field allowlist by @xiaohajiayou in #3483
- [TTS][Perf] Optimize Fish Speech S2 Pro high-concurrency serving by @Sy0307 in #3773
- Fix Ovis image text encoder dtype by @akshatvishu in #3876
- [Bugfix] Ensure stage and diffusion subprocesses exit when parent dies unexpectedly by @RuixiangMa in #3751
- [Test] Add long text output correctness test for Qwen3-Omni by @ZeldaHuang in #3539
- fix image edit docs about use error image url by @lengrongfu in #3873
- [Perf] Bagel Performance Nightly CI test by @NumberWan in #2175
- [Feat] Support online quantization (fp8/int8) for DreamID-Omni by @yuanheng-zhao in #3902
- [MXFP8][XPU] enable mxfp8 using vLLM main repo method by @xuechendi in #3782
- [Blackwell] Add CUDNN_ATTN and FLASHINFER_ATTN backends for diffusion (auto-route) by @lishunyang12 in #3079
- [CI] Add HunyuanVideo 1.5 X2V accuracy tests by @david6666666 in #3852
- [Feature] Add cfg-parallel for LTX-2.3 by @mglyn in #3905
- [Refactor] Unify Snake/SnakeBeta and alias-free activation into common modules by @BeatSeat in #3886
- [Perf][Bugfix] cache hot buffers in qwen3_tts talker; fall back on evicted state by @JuanPZuluaga in #3688
- [3/5][core]refactor communication layer: PR 3 of 5, all other models in non async mode by @natureofnature in #3719
- [Doc] Refine vace offline inference example README by @blondeCS in #3584
- [Diffusion] Unify diffusion request identity on request_id by @yJader in #3744
- [Bugfix] Remove duplicate ffmpeg options in random video generation by @JLiu4Coding in #3923
- [AutoRound] Support GLM-Image W4A16 quantization model by @lvliang-intel in #3059
- [Doc] Reduce browser memory usage for docs by @david6666666 in #3870
- [Refactor][Qwen3-TTS] Construct speech tokenizer encoder natively by @NickCao in #3360
- [CI][Bugfix] Add request id to LTX2.3 CFG parallel test by @mglyn in #3934
- [Perf] Trim Code2Wav CUDA Graph buckets for Qwen3-TTS single-GPU deploy by @R2-Y in #3932
- [CI] Rectify L2~L4 Qwen Image Edit series tests by @fhfuih in #3901
- [Docs]Add recipe for GLM-Image on 2x A800 GPUs and 1x A800 GPU by @nainiu258 in #2950
- [CI][BugFix] Fix and Validate FP8 Z-Image quality gate by @david6666666 in #3929
- [Test] Add scenarios for L5 reliability test by @zhumingjue138 in #3729
- [Blackwell][1/N] Add SageAttention3 diffusion backend on blackwell(GB200/B200/RTX5090/PRO6000/DGX Spark available) by @david6666666 in #3015
- [bugfix, rl] Fix sleep do not release full memory in custom pipeline by @knlnguyen1802 in #3818
- Fix Qwen3-omni accuracy degradation from deepstack inputs under torch.compile by @andakai in #3885
- [Bugfix] Fix Triton SnakeBeta kernel for bf16/fp16 inputs by @wuli666 in #3472
- [XPU] Add CosyVoice3Model support on Intel XPU by @Liangyx2 in #2325
- [Docs] Add recipe for Helios by @JasonJ2021 in #3114
- [ci][nightly] Voxcpm2 performance benchmark by @Shirley125 in #3864
- [BugFix] Fix prefix-caching issue by @amy-why-3459 in #3726
- [bugfix] Solve Nightly / CI failed - tests/e2e/online_serving/test_bagel_expansion.py #3918 by @natureofnature in #3936
- [BugFix] Avoid Voxtral TTS loading error msg by @y123456y78 in #3951
- [Bugfix/Feature] Remove Hardcoded Flash Attention in Bagel & Support GQA in SDPA Backend by @alex-jw-brooks in #3728
- [feat] Support prompt embedding caching for diffusion model by @knlnguyen1802 in #2962
- [Feat]Support voice clone for omnivoice in online serving & add seed parameter for reproducible by @sphinxkkkbc in #3668
- [CI][Bugfix] Fix LTX audio-video warmup output typing by @david6666666 in #3964
- [Bugfix] Fix IndexError in DistributedVaeExecutor when vae_patch_parallel_size < world_size by @QingZhou-YangHY in #3928
- Temp skip TEST - Entrypoint Test with H100 by @congw729 in #3989
- [Perf] Qwen3-Omni performance optimization by @amy-why-3459 in #3878
- [ROCm] Enable AITER backend with ring attention by @avjves in #3511
- [Feat] Support MagCache by @RuixiangMa in #1287
- [Perf][Qwen3-TTS] Restore Code2Wav cross-request batching (RFC #3163 P0) by @ischencheng in #3322
- [Bugfix][Model] Qwen3-TTS: don't collapse 2D ref_code list when estimating prompt length by @nperraud in #3940
- [minor, fix] Allow passing class interface as custom pipeline argument by @knlnguyen1802 in #2973
- [Feat] support cache-dit for SenseNova-U1 by @fywc in #3906
- [CI][XPU]Fix sage_attn hard-code import for cuda by @xuechendi in #3994
- [Diffusion] Support USP and VAE patch parallel for HunyuanVideo 1.5 by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/3979
- [HunyuanImage][Perf] adapt to deploy config changes by @Bounty-hunter in https://github.com/vllm-project/vllm-omni/pull/3996
- [Refactor][Qwen3-TTS] Extract reusable prompt-embeds builder and make tts_pad_embed a persistent buffer by @vklimkov-nvidia in https://github.com/vllm-project/vllm-omni/pull/3992
- docs: update WeChat QR code by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/4003
- [Config Refactor] Migrate Ming-flash-omni-2.0 Image-Gen deploy configs by @yuanheng-zhao in https://github.com/vllm-project/vllm-omni/pull/3975
- [Bugfix][Tests] Remove unnecessary device map in tests init by @wuhang2014 in https://github.com/vllm-project/vllm-omni/pull/3958
- [CI/Bugfix] Async Request ID Aliasing by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/3953
- [CI] Temporarily skip failing Bagel connector tests by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/4005
- [Bugfix] Fix DiffusionWorker crash on SIGINT: ensure NCCL/ZMQ cleanup on shutdown by @wuhang2014 in https://github.com/vllm-project/vllm-omni/pull/3872
- [Recipe] add mistralai voxtral tts recipe by @Dmaner in https://github.com/vllm-project/vllm-omni/pull/3498
- Fix hunyuan resolve stop token ids by @BLANKETusers in https://github.com/vllm-project/vllm-omni/pull/3896
- [Refactor] Unify _talker_mtp_forward across GPU and NPU model runners by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/3476
- [BugFix]Qwen-Image performance regression by using omni RMSNorm(RMSNorm backend) by @NumberWan in https://github.com/vllm-project/vllm-omni/pull/3933
- [Feat]audio streaming input for async chunk by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/3614
- [model, omni] feat: Qwen3-Omni Thinker LoRA for RL training by @qinganrice in https://github.com/vllm-project/vllm-omni/pull/3915
- [Feature] Add precomputed custom voices and Qwen3-TTS ref-context cache by @Sy0307 in https://github.com/vllm-project/vllm-omni/pull/3492
- [Rebase] Rebase to vllm releases/v0.22.0 by @tzhouam in https://github.com/vllm-project/vllm-omni/pull/3891
- [Bugfix] Fix FLUX W4A16/AutoRound quant_config propagation by @yiliu30 in https://github.com/vllm-project/vllm-omni/pull/3587
- [Feat]upgrade vllm version [skip-ci] by @lengrongfu in https://github.com/vllm-project/vllm-omni/pull/4022
- [skip ci][Recipe] OpenBMB/VoxCPM2 by @wjinxu in https://github.com/vllm-project/vllm-omni/pull/3850
- [Entrypoint] Add realtime OpenPI robot serving API by @TKONIY in https://github.com/vllm-project/vllm-omni/pull/3673
- [Feat]Support Nonverbal Tags in OmniVoice by @sphinxkkkbc in https://github.com/vllm-project/vllm-omni/pull/3968
- [New Model] Add Lance (ByteDance) by @lishunyang12 in https://github.com/vllm-project/vllm-omni/pull/3710
- [Perf] Deduplicate AR prefix cache hidden-state CPU staging by @TaffyOfficial in https://github.com/vllm-project/vllm-omni/pull/3734
- [Metrics] Add audio SLOs + cross-stage transfer families + per-(stage, replica) wrap for upstream vllm:* by @LHXuuu in https://github.com/vllm-project/vllm-omni/pull/3576
- [BugFix] Fix two stop reason for multimodal output by @QiuMike in https://github.com/vllm-project/vllm-omni/pull/3374
- [Perf][TTS] Bounded-K active-stream window for Stage 1 (RFC #3535) by @linyueqian in https://github.com/vllm-project/vllm-omni/pull/3592
- [ROCm] [CI] Bugfix Existing CI cases by @tjtanaa in https://github.com/vllm-project/vllm-omni/pull/3946
- [Model]Support MiniCPM-o 4.5 by @tc-mb in https://github.com/vllm-project/vllm-omni/pull/3642
- Add Cosmos3 model support by @MaciejBalaNV in https://github.com/vllm-project/vllm-omni/pull/3454
- [XPU][Rebase v0.22] Fix for 0.22 rebase by @xuechendi in https://github.com/vllm-project/vllm-omni/pull/4059
- [Perf][Bagel] Avoid per-step device syncs in Bagel img2img by @natureofnature in https://github.com/vllm-project/vllm-omni/pull/3987
- add MiniCPM-o 4.5 recipe under recipes/OpenBMB by @tc-mb in https://github.com/vllm-project/vllm-omni/pull/4067
- [TTS][Model] support MOSS-TTS series by @zhangj1an in https://github.com/vllm-project/vllm-omni/pull/3420
- [Bugfix] Fix SD3 T5 truncation check device mismatch on long prompts by @bkdoeng in https://github.com/vllm-project/vllm-omni/pull/3949
- Support VAE parallel for Bagel by @lsyyysky in https://github.com/vllm-project/vllm-omni/pull/3982
- [Core] Integrate TrackingArgumentParser by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/3369
- [Bugfix] fix qwen3-omni performance regression by @R2-Y in https://github.com/vllm-project/vllm-omni/pull/3575
- [BugFix]Qwen-Image performance regression by using torch RMSNorm(RMSNorm backend) by @NumberWan in https://github.com/vllm-project/vllm-omni/pull/4074
- [NPU] [Perf] Adjust flash_attn mask shape for hunyuanvideo1.5 on npu by @vasede in https://github.com/vllm-project/vllm-omni/pull/3178
- [Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving by @TKONIY in https://github.com/vllm-project/vllm-omni/pull/2162
- [Bugfix] Pass media paths to use_mixed_modalities in example script by @NickCao in https://github.com/vllm-project/vllm-omni/pull/3355
- [Refactor] Migrate dynin_omni to pipeline registry, drop legacy stage… by @AbelSara in https://github.com/vllm-project/vllm-omni/pull/4078
- Add Cosmos3 sound generation by @MaciejBalaNV in https://github.com/vllm-project/vllm-omni/pull/4073
- [ci] add Voxcpm2 accuracy tests by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4055
- [BugFix] Fix the issue of dataset names not being resolved by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4094
- [bugfix] fix streaming input issue after rebase 0.22.0 by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4085
- [CI][Accuracy] Add HunyuanImage3 pixel accuracy test and nightly CI by @BLANKETusers in https://github.com/vllm-project/vllm-omni/pull/3790
- [Test] Add prefix caching + audio output regression test (#3510) by @oglok in https://github.com/vllm-project/vllm-omni/pull/3604
- [Refactor] Refactor HunyuanImage3 SigLIP2 ViT to vLLM layers by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/3297
- [ci] add merge/ready ci for audio realtime api by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4069
- Update qwen3_tts_code2wav.py by @tanhaoan333 in https://github.com/vllm-project/vllm-omni/pull/4075
- [Perf][VoxCPM2] Optimize VoxCPM2 high-concurrency decode throughput. by @Sy0307 in https://github.com/vllm-project/vllm-omni/pull/3882
- [CI] Remove omni mark for MOSS-TTS and temporarily skipped by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/4097
- [BugFix] Fix the issue of vllm failing to start. by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4105
- Add Cosmos3 action modality by @bastefaniak in https://github.com/vllm-project/vllm-omni/pull/4102
- [Perf][Qwen3-Omni]Optimize TTFP using initial_codec_chunk_frames by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4054
- [CI] Skip online moss test temp by @gcanlin in https://github.com/vllm-project/vllm-omni/pull/4122
- [bugfix]qwen3tts code2wav by @tanhaoan333 in https://github.com/vllm-project/vllm-omni/pull/4123
- [NPU][BugFix] Upgrade parts of ModelRunner to v0.22.0 by @tanhaoan333 in https://github.com/vllm-project/vllm-omni/pull/4130
- [Bugfix] harden diffusion model prefetch against transformers v5 shard-resolution race by @tzhouam in https://github.com/vllm-project/vllm-omni/pull/4076
- [Test] Add L4 diffusion feature test for GLM-Image by @herotai214 in https://github.com/vllm-project/vllm-omni/pull/3451
- [HunyuanImage3][CI] fix ci by @Bounty-hunter in https://github.com/vllm-project/vllm-omni/pull/4134
- fix: cosyvoice3 batch>1 inference by @yuekaizhang in https://github.com/vllm-project/vllm-omni/pull/3910
- [BugFix] Cast bf16 video frames to float32 before .numpy() in /v1/videos by @BruceLoveDecimal in https://github.com/vllm-project/vllm-omni/pull/4114
- Add dependency FlagEmbedding by @congw729 in https://github.com/vllm-project/vllm-omni/pull/3980
- [CI] Update Bagel Pixels by @alex-jw-brooks in https://github.com/vllm-project/vllm-omni/pull/4081
- Benchmark data statistics for each stage of omni models by @ZacheryAU in https://github.com/vllm-project/vllm-omni/pull/3628
- [Refactor] [Qwen3-Omni]Modify the thinker's sampling parameters to align with transformers. by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4137
- [HunyuanImage3.0][Performance][Optimazation]Adjust perf config by @Bounty-hunter in https://github.com/vllm-project/vllm-omni/pull/4149
- [BugFix] Fix incorrect GPU device mapping in multi-replica stages by @ZhengWG in https://github.com/vllm-project/vllm-omni/pull/4132
- [XPU][S2V] Optimize Wan2.2 S2V: RoPE refactor + cache_dit enabling by @xuechendi in https://github.com/vllm-project/vllm-omni/pull/4062
- [BugFix] Support ModelOpt FP8 under batched diffusion serving by @david6666666 in https://github.com/vllm-project/vllm-omni/pull/4155
- [Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22 by @yuanheng-zhao in https://github.com/vllm-project/vllm-omni/pull/4080
- [Feature] gen-only FP8 Quantization Support for SenseNova-U1 by @leohuang257 in https://github.com/vllm-project/vllm-omni/pull/3943
- [skip ci]cleanup(assets): remove dead vllm_omni/assets/video.py by @Shylin26 in https://github.com/vllm-project/vllm-omni/pull/4120
- [Bugfix] Update the value of --max-seed-tts-mean-wer in the accuracy test. by @amy-why-3459 in https://github.com/vllm-project/vllm-omni/pull/4160
- [Bugfix]Fix HunyuanImage3 conditional image and prompt kwargs alignment by @Yaegaki1Erika in https://github.com/vllm-project/vllm-omni/pull/4145
- [Perf] Enable fused RMSNorm for HunyuanImage3 by @Bill845514379 in https://github.com/vllm-project/vllm-omni/pull/3959
- [Fix] Update Qwen3 Omni multi-replica perf baselines by @fake0fan in https://github.com/vllm-project/vllm-omni/pull/4175
- [CI][bugfix]: Improve Qwen Image accuracy test with diffusers attn alignment by @fhfuih in https://github.com/vllm-project/vllm-omni/pull/4143
- [bugfix] fix realtime ci timeout error by @Shirley125 in https://github.com/vllm-project/vllm-omni/pull/4187
- [CI] Revert "[Feature] gen-only FP8 Quantization Support for SenseNova-U1" by @Gaohan123 in https://github.com/vllm-project/vllm-omni/pull/4196
New Contributors
- @yuchenjiangyj made their first contribution in #3239
- @Phi-C made their first contribution in #3436
- @chzhang2021 made their first contribution in #3130
- @Wallbreazzz made their first contribution in #3453
- @baonudesifeizhai made their first contribution in #2913
- @saadaltohamy made their first contribution in #3320
- @weizhoublue made their first contribution in #3568
- @hxhhhlalala made their first contribution in #3140
- @zengchuang-hw made their first contribution in #3551
- @keeper-jie made their first contribution in #3380
- @MmMaiIIi made their first contribution in #3711
- @lulugoodcoder made their first contribution in #3608
- @bitborne made their first contribution in #3720
- @yixiaoer made their first contribution in #3684
- @ksiyuan made their first contribution in #3598
- @Galleons2029 made their first contribution in #3686
- @qidaye made their first contribution in #3541
- @ANHDY made their first contribution in #2572
- @univa-HARRY made their first contribution in #3336
- @BeatSeat made their first contribution in #3141
- @nagisa-kunhah made their first contribution in #3829
- @BLANKETusers made their first contribution in #3795
- @yuekaizhang made their first contribution in #3762
- @mglyn made their first contribution in #3854
- @avjves made their first contribution in #3419
- @blondeCS made their first contribution in #3584
- @JLiu4Coding made their first contribution in #3923
- @nainiu258 made their first contribution in #2950
- @andakai made their first contribution in #3885
- @wuli666 made their first contribution in #3472
- @Liangyx2 made their first contribution in #2325
- @QingZhou-YangHY made their first contribution in #3928
- @ischencheng made their first contribution in #3322
- @nperraud made their first contribution in #3940
- @vklimkov-nvidia made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3992
- @Dmaner made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3498
- @qinganrice made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3915
- @wjinxu made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3850
- @LHXuuu made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3576
- @tc-mb made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3642
- @MaciejBalaNV made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3454
- @bkdoeng made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3949
- @vasede made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3178
- @AbelSara made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4078
- @tanhaoan333 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4075
- @bastefaniak made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4102
- @herotai214 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3451
- @BruceLoveDecimal made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4114
- @ZacheryAU made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3628
- @Shylin26 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4120
- @Yaegaki1Erika made their first contribution in https://github.com/vllm-project/vllm-omni/pull/4145
- @Bill845514379 made their first contribution in https://github.com/vllm-project/vllm-omni/pull/3959
Full Changelog: v0.20.0...v0.22.0