vllm-project/vllm-omni v0.23.0rc1 on GitHub

What's Changed

[TTS][New Model] support bosonai/higgs-audio-v3-tts-4b by @yuekaizhang in #4169
[Perf][Higgs-Audio-V3] LRU cache for voice-clone ref-audio encode by @linyueqian in #4200
[Perf][Higgs-Audio-V3] Turn on Stage-0 prefix caching by default by @linyueqian in #4199
[Doc] Add Stable-Diffusion-3.5 recipe for 1x RTX A6000 48GB (#2645) by @yangyonggit in #4052
[Bugfix] Fish Speech Gradio hardcoded default voice causes 400 error by @nagelanping in #3941
[Model] Support languages added by fine-tuned Qwen3-TTS checkpoints by @n0n4m39911 in #4210
[Quant] ModelOpt FP8 for Wan2.2 & HunyuanVideo-1.5 video-gen by @lishunyang12 in #3305
[BugFix] fix hunyuan image3 offline cot by @BLANKETusers in #4174
[skip ci] docs: update WeChat QR code by @david6666666 in #4242
[Refactor]Refactoring audio_in_video implementation by @amy-why-3459 in #3566
[Doc] Add precheck-pr Claude Code skill by @hsliuustc0106 in #4216
[Perf] Keep LTX2.3 auxiliary modules resident by default by @mglyn in #4144
[Fix][HiggsAudioV3] Fix ramp-down off-by-one crash and buffer state bugs in Stage0 talker by @Sy0307 in #4219
Cosmos3 video to video generation by @MaciejBalaNV in #4266
[Quant][Perf] Default Blackwell FP8 GEMM to quack CuteDSL fused-bias kernel by @lishunyang12 in #4241
[Refactor] Remove dead legacy pipeline.yaml loader and duplicate diff… by @AbelSara in #4023
[Core] Cleanup DiffusersPipelineLoader by @alex-jw-brooks in #1932
docs: update README and supported models for v0.22.0 release by @hsliuustc0106 in #4233
fix: Propagate Quantization Configuration to HunyuanVideo-1.5 I2V Transformer to Enable FP8 Layers by @weizhoublue in #4245
[Perf] lance perf optimize (t2i & i2i) by @yangjianjuan in #4214
[Bugfix] Register LTX RMSNorm identity weight as buffer by @mglyn in #4278
Fix README.md typo by @SamitHuang in #4292
[XPU] Add sage_attn backend by @xuechendi in #3785
[Feat] Add vae-decode-parallel for LTX-2.3 by @mglyn in #4277
[ROCm] [CI] Add group feature and envs feature by @tjtanaa in #4208
[Feature] LoRA support for SenseNova-U1 by @leohuang257 in #3971
[BugFix] Fix HunyuanImage3 CoT truncation when stop token stripped by detokenizer by @zengchuang-hw in #4260
[Perf][Bugfix] qwen3-tts hot path: prefix-cache OOM guards + talker/orchestrator micro-opts by @JuanPZuluaga in #3689
add moriio transfer engine by @inkcherry in #1742
[skip ci] fix(config): pin VoxCPM2 KV cache to avoid OOM on small GPUs by @linyueqian in #4279
[Bugfix] purge chunk-transfer zombies on every schedule tick to keep engine-core alive on aborts (fixes #3736) by @abinggo in #3774
[BugFix] Remove pydub dependency for Python 3.13 compatibility by @FED4 in #4035
[CI] skip unrelated L3 merge tests via diff-aware upload by @yenuo26 in #4291
[Perf][TTS] Optimize cosyvoice TTFP and throughput using TensorRT by @yuekaizhang in #4168
[Bugfix] Fix Fish Speech prefix cache collision from missing cache_salt by @nagelanping in #4008
[Test] skip oom test case for issue #4285 by @zhumingjue138 in #4311
[Docs] Update README supported models section with TTS and Diffusion categories by @hsliuustc0106 in #4300
[Model] Add Ming-omni-tts dense 0.5B pipeline by @akshatvishu in #2906
[CI/Build] Voxtral TTS Tests by @clodaghwalsh17 in #3738
[BugFix] moss_tts_nano: eager-init lm + audio_tokenizer in init so load_format: dummy works by @leohuang257 in #3230
[skip ci] Add width and height args to offline i2i example script by @fhfuih in #4031
[Bugfix] qwen3-tts prefix cache: drop per-key size cap that corrupted… by @JuanPZuluaga in #4317
Step audio R1 reasoning parser by @QiuMike in #2846
[Test] Automatically clean up audio files generated from requests, and realtime invalid-param coverage by @yenuo26 in #4294
[Skills] Add quantization Claude skill by @david6666666 in #4252
[Docs] add doc for failure mode by @zhumingjue138 in #3926
[Bugfix][HunyuanImage] fix accuracy in stream mode by @Bounty-hunter in #4265
[Bugfix][VoxCPM2] Fix VoxCPM2 concurrent speech quality by @Shirley125 in #4319
[Bugfix] Fix SP denoise indentation bug on BAGEL by @kushanam in #4328
[CI] skip unrelated L2 ready tests via diff-aware upload, aligned L3 tweak and fix issue 4334 by @yenuo26 in #4313
[NPU] Support VoxCPM2 model by @tanhaoan333 in #4310
[Doc] Clean up PR template by @hsliuustc0106 in #4336
[Bugfix] Add Cosmos3-Nano baselines and fix USP gather by @david6666666 in #4301
[WAN2.2-S2V] Add server API for image + audio by @xuechendi in #3394
[HunyuanImage][Perf] opt prepare_attention_mask for e2e latency 6% reduction by @Bounty-hunter in #4333
[Perf] Optimize Higgs Audio v3 serving by @Sy0307 in #4204
[refactor] Refactor guardrail error handling - add 400 error code by @MaciejBalaNV in #4297
Feat: non_streaming_mode for Qwen3-TTS Base Models During Online Inference by @nagisa-kunhah in #4198
feat(moss-tts): add CUDA Graph support for codec decoder by @yangyonggit in #4157
[Bugfix] Fix Qwen3-TTS gradio streaming TTFP by using audio/pcm by @yuxinyuan in #4346
[XPU][COSMOS3] removing cuda hardcode and make VLLM_VIDEO_SYNC_TIMEOUT a tunable config by @xuechendi in #4360
[Model] Add more resolution support for HunyuanImage3.0 by @Semmer2 in #4004
[Refactor] Output Processor Phase 2: separate multimodal output channel (#1601) by @meghaagr13 in #2744
[platform] fix: set UnspecifiedOmniPlatform device_type to cpu by @SamitHuang in #4357
[bugfix]VoxCPM2 audio encoder adapt other than CUDA by @tanhaoan333 in #4374
[ci]skip voxcpm2 pcm hnr test by @Shirley125 in #4375
[Hardware][Ascend] Adapt Qwen3 TTS for 310P by @zyz111222 in #4283
[CI] Diff-gate L2/L3 E2E jobs and migrate single-GPU tests to gpu_1_queue by @yenuo26 in #4365
[CI] skip MOSS-TTS-Nano E2E tests pending issue#4361 by @yenuo26 in #4391
[Perf][Wan2.2] Skip attention mask for zero-padded SP sequences to avoid varlen path by @jjuvonen-amd in #3763
[Bugfix] Fix cross-request codes.ref leak in Qwen3-TTS make_omni_output by @henryj in #4373
[New Model] Step-Audio2 by @wuli666 in #464
[Bugfix] Ming-omni-tts: generalize dict checks to Mapping by @akshatvishu in #4397
[XPU][CI] Fix docker build slowness by @xuechendi in #4402
[Refactor] TTS serving adapter framework + migrate TTS models (excl. ming_flash_omni) by @linyueqian in #4330
[Bugfix] Surface all-rank diffusion RPC failures by @hsliuustc0106 in #4403
[Doc][Recipe] Update CUDA verifications for inclusionAI/Ming-omni-tts-0.5B by @yuanheng-zhao in #4324
[XPU] update Dreamzero to support any HW by @xuechendi in #4399
[Rebase] Rebase to vllm 0.23.0 by @tzhouam in #4286

New Contributors

@yangyonggit made their first contribution in #4052
@nagelanping made their first contribution in #3941
@n0n4m39911 made their first contribution in #4210
@inkcherry made their first contribution in #1742
@abinggo made their first contribution in #3774
@FED4 made their first contribution in #4035
@clodaghwalsh17 made their first contribution in #3738
@kushanam made their first contribution in #4328
@yuxinyuan made their first contribution in #4346
@zyz111222 made their first contribution in #4283
@jjuvonen-amd made their first contribution in #3763
@henryj made their first contribution in #4373

Full Changelog: v0.22.0...v0.23.0rc1