github vllm-project/vllm-omni v0.23.0rc1

pre-releaseone hour ago

What's Changed

  • [TTS][New Model] support bosonai/higgs-audio-v3-tts-4b by @yuekaizhang in #4169
  • [Perf][Higgs-Audio-V3] LRU cache for voice-clone ref-audio encode by @linyueqian in #4200
  • [Perf][Higgs-Audio-V3] Turn on Stage-0 prefix caching by default by @linyueqian in #4199
  • [Doc] Add Stable-Diffusion-3.5 recipe for 1x RTX A6000 48GB (#2645) by @yangyonggit in #4052
  • [Bugfix] Fish Speech Gradio hardcoded default voice causes 400 error by @nagelanping in #3941
  • [Model] Support languages added by fine-tuned Qwen3-TTS checkpoints by @n0n4m39911 in #4210
  • [Quant] ModelOpt FP8 for Wan2.2 & HunyuanVideo-1.5 video-gen by @lishunyang12 in #3305
  • [BugFix] fix hunyuan image3 offline cot by @BLANKETusers in #4174
  • [skip ci] docs: update WeChat QR code by @david6666666 in #4242
  • [Refactor]Refactoring audio_in_video implementation by @amy-why-3459 in #3566
  • [Doc] Add precheck-pr Claude Code skill by @hsliuustc0106 in #4216
  • [Perf] Keep LTX2.3 auxiliary modules resident by default by @mglyn in #4144
  • [Fix][HiggsAudioV3] Fix ramp-down off-by-one crash and buffer state bugs in Stage0 talker by @Sy0307 in #4219
  • Cosmos3 video to video generation by @MaciejBalaNV in #4266
  • [Quant][Perf] Default Blackwell FP8 GEMM to quack CuteDSL fused-bias kernel by @lishunyang12 in #4241
  • [Refactor] Remove dead legacy pipeline.yaml loader and duplicate diff… by @AbelSara in #4023
  • [Core] Cleanup DiffusersPipelineLoader by @alex-jw-brooks in #1932
  • docs: update README and supported models for v0.22.0 release by @hsliuustc0106 in #4233
  • fix: Propagate Quantization Configuration to HunyuanVideo-1.5 I2V Transformer to Enable FP8 Layers by @weizhoublue in #4245
  • [Perf] lance perf optimize (t2i & i2i) by @yangjianjuan in #4214
  • [Bugfix] Register LTX RMSNorm identity weight as buffer by @mglyn in #4278
  • Fix README.md typo by @SamitHuang in #4292
  • [XPU] Add sage_attn backend by @xuechendi in #3785
  • [Feat] Add vae-decode-parallel for LTX-2.3 by @mglyn in #4277
  • [ROCm] [CI] Add group feature and envs feature by @tjtanaa in #4208
  • [Feature] LoRA support for SenseNova-U1 by @leohuang257 in #3971
  • [BugFix] Fix HunyuanImage3 CoT truncation when stop token stripped by detokenizer by @zengchuang-hw in #4260
  • [Perf][Bugfix] qwen3-tts hot path: prefix-cache OOM guards + talker/orchestrator micro-opts by @JuanPZuluaga in #3689
  • add moriio transfer engine by @inkcherry in #1742
  • [skip ci] fix(config): pin VoxCPM2 KV cache to avoid OOM on small GPUs by @linyueqian in #4279
  • [Bugfix] purge chunk-transfer zombies on every schedule tick to keep engine-core alive on aborts (fixes #3736) by @abinggo in #3774
  • [BugFix] Remove pydub dependency for Python 3.13 compatibility by @FED4 in #4035
  • [CI] skip unrelated L3 merge tests via diff-aware upload by @yenuo26 in #4291
  • [Perf][TTS] Optimize cosyvoice TTFP and throughput using TensorRT by @yuekaizhang in #4168
  • [Bugfix] Fix Fish Speech prefix cache collision from missing cache_salt by @nagelanping in #4008
  • [Test] skip oom test case for issue #4285 by @zhumingjue138 in #4311
  • [Docs] Update README supported models section with TTS and Diffusion categories by @hsliuustc0106 in #4300
  • [Model] Add Ming-omni-tts dense 0.5B pipeline by @akshatvishu in #2906
  • [CI/Build] Voxtral TTS Tests by @clodaghwalsh17 in #3738
  • [BugFix] moss_tts_nano: eager-init lm + audio_tokenizer in init so load_format: dummy works by @leohuang257 in #3230
  • [skip ci] Add width and height args to offline i2i example script by @fhfuih in #4031
  • [Bugfix] qwen3-tts prefix cache: drop per-key size cap that corrupted… by @JuanPZuluaga in #4317
  • Step audio R1 reasoning parser by @QiuMike in #2846
  • [Test] Automatically clean up audio files generated from requests, and realtime invalid-param coverage by @yenuo26 in #4294
  • [Skills] Add quantization Claude skill by @david6666666 in #4252
  • [Docs] add doc for failure mode by @zhumingjue138 in #3926
  • [Bugfix][HunyuanImage] fix accuracy in stream mode by @Bounty-hunter in #4265
  • [Bugfix][VoxCPM2] Fix VoxCPM2 concurrent speech quality by @Shirley125 in #4319
  • [Bugfix] Fix SP denoise indentation bug on BAGEL by @kushanam in #4328
  • [CI] skip unrelated L2 ready tests via diff-aware upload, aligned L3 tweak and fix issue 4334 by @yenuo26 in #4313
  • [NPU] Support VoxCPM2 model by @tanhaoan333 in #4310
  • [Doc] Clean up PR template by @hsliuustc0106 in #4336
  • [Bugfix] Add Cosmos3-Nano baselines and fix USP gather by @david6666666 in #4301
  • [WAN2.2-S2V] Add server API for image + audio by @xuechendi in #3394
  • [HunyuanImage][Perf] opt prepare_attention_mask for e2e latency 6% reduction by @Bounty-hunter in #4333
  • [Perf] Optimize Higgs Audio v3 serving by @Sy0307 in #4204
  • [refactor] Refactor guardrail error handling - add 400 error code by @MaciejBalaNV in #4297
  • Feat: non_streaming_mode for Qwen3-TTS Base Models During Online Inference by @nagisa-kunhah in #4198
  • feat(moss-tts): add CUDA Graph support for codec decoder by @yangyonggit in #4157
  • [Bugfix] Fix Qwen3-TTS gradio streaming TTFP by using audio/pcm by @yuxinyuan in #4346
  • [XPU][COSMOS3] removing cuda hardcode and make VLLM_VIDEO_SYNC_TIMEOUT a tunable config by @xuechendi in #4360
  • [Model] Add more resolution support for HunyuanImage3.0 by @Semmer2 in #4004
  • [Refactor] Output Processor Phase 2: separate multimodal output channel (#1601) by @meghaagr13 in #2744
  • [platform] fix: set UnspecifiedOmniPlatform device_type to cpu by @SamitHuang in #4357
  • [bugfix]VoxCPM2 audio encoder adapt other than CUDA by @tanhaoan333 in #4374
  • [ci]skip voxcpm2 pcm hnr test by @Shirley125 in #4375
  • [Hardware][Ascend] Adapt Qwen3 TTS for 310P by @zyz111222 in #4283
  • [CI] Diff-gate L2/L3 E2E jobs and migrate single-GPU tests to gpu_1_queue by @yenuo26 in #4365
  • [CI] skip MOSS-TTS-Nano E2E tests pending issue#4361 by @yenuo26 in #4391
  • [Perf][Wan2.2] Skip attention mask for zero-padded SP sequences to avoid varlen path by @jjuvonen-amd in #3763
  • [Bugfix] Fix cross-request codes.ref leak in Qwen3-TTS make_omni_output by @henryj in #4373
  • [New Model] Step-Audio2 by @wuli666 in #464
  • [Bugfix] Ming-omni-tts: generalize dict checks to Mapping by @akshatvishu in #4397
  • [XPU][CI] Fix docker build slowness by @xuechendi in #4402
  • [Refactor] TTS serving adapter framework + migrate TTS models (excl. ming_flash_omni) by @linyueqian in #4330
  • [Bugfix] Surface all-rank diffusion RPC failures by @hsliuustc0106 in #4403
  • [Doc][Recipe] Update CUDA verifications for inclusionAI/Ming-omni-tts-0.5B by @yuanheng-zhao in #4324
  • [XPU] update Dreamzero to support any HW by @xuechendi in #4399
  • [Rebase] Rebase to vllm 0.23.0 by @tzhouam in #4286

New Contributors

Full Changelog: v0.22.0...v0.23.0rc1

Don't miss a new vllm-omni release

NewReleases is sending notifications on new releases.