What's Changed
- [TTS][New Model] support bosonai/higgs-audio-v3-tts-4b by @yuekaizhang in #4169
- [Perf][Higgs-Audio-V3] LRU cache for voice-clone ref-audio encode by @linyueqian in #4200
- [Perf][Higgs-Audio-V3] Turn on Stage-0 prefix caching by default by @linyueqian in #4199
- [Doc] Add Stable-Diffusion-3.5 recipe for 1x RTX A6000 48GB (#2645) by @yangyonggit in #4052
- [Bugfix] Fish Speech Gradio hardcoded default voice causes 400 error by @nagelanping in #3941
- [Model] Support languages added by fine-tuned Qwen3-TTS checkpoints by @n0n4m39911 in #4210
- [Quant] ModelOpt FP8 for Wan2.2 & HunyuanVideo-1.5 video-gen by @lishunyang12 in #3305
- [BugFix] fix hunyuan image3 offline cot by @BLANKETusers in #4174
- [skip ci] docs: update WeChat QR code by @david6666666 in #4242
- [Refactor]Refactoring audio_in_video implementation by @amy-why-3459 in #3566
- [Doc] Add precheck-pr Claude Code skill by @hsliuustc0106 in #4216
- [Perf] Keep LTX2.3 auxiliary modules resident by default by @mglyn in #4144
- [Fix][HiggsAudioV3] Fix ramp-down off-by-one crash and buffer state bugs in Stage0 talker by @Sy0307 in #4219
- Cosmos3 video to video generation by @MaciejBalaNV in #4266
- [Quant][Perf] Default Blackwell FP8 GEMM to quack CuteDSL fused-bias kernel by @lishunyang12 in #4241
- [Refactor] Remove dead legacy pipeline.yaml loader and duplicate diff… by @AbelSara in #4023
- [Core] Cleanup DiffusersPipelineLoader by @alex-jw-brooks in #1932
- docs: update README and supported models for v0.22.0 release by @hsliuustc0106 in #4233
- fix: Propagate Quantization Configuration to HunyuanVideo-1.5 I2V Transformer to Enable FP8 Layers by @weizhoublue in #4245
- [Perf] lance perf optimize (t2i & i2i) by @yangjianjuan in #4214
- [Bugfix] Register LTX RMSNorm identity weight as buffer by @mglyn in #4278
- Fix README.md typo by @SamitHuang in #4292
- [XPU] Add sage_attn backend by @xuechendi in #3785
- [Feat] Add vae-decode-parallel for LTX-2.3 by @mglyn in #4277
- [ROCm] [CI] Add group feature and envs feature by @tjtanaa in #4208
- [Feature] LoRA support for SenseNova-U1 by @leohuang257 in #3971
- [BugFix] Fix HunyuanImage3 CoT truncation when stop token stripped by detokenizer by @zengchuang-hw in #4260
- [Perf][Bugfix] qwen3-tts hot path: prefix-cache OOM guards + talker/orchestrator micro-opts by @JuanPZuluaga in #3689
- add moriio transfer engine by @inkcherry in #1742
- [skip ci] fix(config): pin VoxCPM2 KV cache to avoid OOM on small GPUs by @linyueqian in #4279
- [Bugfix] purge chunk-transfer zombies on every schedule tick to keep engine-core alive on aborts (fixes #3736) by @abinggo in #3774
- [BugFix] Remove pydub dependency for Python 3.13 compatibility by @FED4 in #4035
- [CI] skip unrelated L3 merge tests via diff-aware upload by @yenuo26 in #4291
- [Perf][TTS] Optimize cosyvoice TTFP and throughput using TensorRT by @yuekaizhang in #4168
- [Bugfix] Fix Fish Speech prefix cache collision from missing cache_salt by @nagelanping in #4008
- [Test] skip oom test case for issue #4285 by @zhumingjue138 in #4311
- [Docs] Update README supported models section with TTS and Diffusion categories by @hsliuustc0106 in #4300
- [Model] Add Ming-omni-tts dense 0.5B pipeline by @akshatvishu in #2906
- [CI/Build] Voxtral TTS Tests by @clodaghwalsh17 in #3738
- [BugFix] moss_tts_nano: eager-init lm + audio_tokenizer in init so load_format: dummy works by @leohuang257 in #3230
- [skip ci] Add width and height args to offline i2i example script by @fhfuih in #4031
- [Bugfix] qwen3-tts prefix cache: drop per-key size cap that corrupted… by @JuanPZuluaga in #4317
- Step audio R1 reasoning parser by @QiuMike in #2846
- [Test] Automatically clean up audio files generated from requests, and realtime invalid-param coverage by @yenuo26 in #4294
- [Skills] Add quantization Claude skill by @david6666666 in #4252
- [Docs] add doc for failure mode by @zhumingjue138 in #3926
- [Bugfix][HunyuanImage] fix accuracy in stream mode by @Bounty-hunter in #4265
- [Bugfix][VoxCPM2] Fix VoxCPM2 concurrent speech quality by @Shirley125 in #4319
- [Bugfix] Fix SP denoise indentation bug on BAGEL by @kushanam in #4328
- [CI] skip unrelated L2 ready tests via diff-aware upload, aligned L3 tweak and fix issue 4334 by @yenuo26 in #4313
- [NPU] Support VoxCPM2 model by @tanhaoan333 in #4310
- [Doc] Clean up PR template by @hsliuustc0106 in #4336
- [Bugfix] Add Cosmos3-Nano baselines and fix USP gather by @david6666666 in #4301
- [WAN2.2-S2V] Add server API for image + audio by @xuechendi in #3394
- [HunyuanImage][Perf] opt prepare_attention_mask for e2e latency 6% reduction by @Bounty-hunter in #4333
- [Perf] Optimize Higgs Audio v3 serving by @Sy0307 in #4204
- [refactor] Refactor guardrail error handling - add 400 error code by @MaciejBalaNV in #4297
- Feat:
non_streaming_modefor Qwen3-TTS Base Models During Online Inference by @nagisa-kunhah in #4198 - feat(moss-tts): add CUDA Graph support for codec decoder by @yangyonggit in #4157
- [Bugfix] Fix Qwen3-TTS gradio streaming TTFP by using audio/pcm by @yuxinyuan in #4346
- [XPU][COSMOS3] removing cuda hardcode and make VLLM_VIDEO_SYNC_TIMEOUT a tunable config by @xuechendi in #4360
- [Model] Add more resolution support for HunyuanImage3.0 by @Semmer2 in #4004
- [Refactor] Output Processor Phase 2: separate multimodal output channel (#1601) by @meghaagr13 in #2744
- [platform] fix: set UnspecifiedOmniPlatform device_type to cpu by @SamitHuang in #4357
- [bugfix]VoxCPM2 audio encoder adapt other than CUDA by @tanhaoan333 in #4374
- [ci]skip voxcpm2 pcm hnr test by @Shirley125 in #4375
- [Hardware][Ascend] Adapt Qwen3 TTS for 310P by @zyz111222 in #4283
- [CI] Diff-gate L2/L3 E2E jobs and migrate single-GPU tests to gpu_1_queue by @yenuo26 in #4365
- [CI] skip MOSS-TTS-Nano E2E tests pending issue#4361 by @yenuo26 in #4391
- [Perf][Wan2.2] Skip attention mask for zero-padded SP sequences to avoid varlen path by @jjuvonen-amd in #3763
- [Bugfix] Fix cross-request codes.ref leak in Qwen3-TTS make_omni_output by @henryj in #4373
- [New Model] Step-Audio2 by @wuli666 in #464
- [Bugfix] Ming-omni-tts: generalize dict checks to Mapping by @akshatvishu in #4397
- [XPU][CI] Fix docker build slowness by @xuechendi in #4402
- [Refactor] TTS serving adapter framework + migrate TTS models (excl. ming_flash_omni) by @linyueqian in #4330
- [Bugfix] Surface all-rank diffusion RPC failures by @hsliuustc0106 in #4403
- [Doc][Recipe] Update CUDA verifications for inclusionAI/Ming-omni-tts-0.5B by @yuanheng-zhao in #4324
- [XPU] update Dreamzero to support any HW by @xuechendi in #4399
- [Rebase] Rebase to vllm 0.23.0 by @tzhouam in #4286
New Contributors
- @yangyonggit made their first contribution in #4052
- @nagelanping made their first contribution in #3941
- @n0n4m39911 made their first contribution in #4210
- @inkcherry made their first contribution in #1742
- @abinggo made their first contribution in #3774
- @FED4 made their first contribution in #4035
- @clodaghwalsh17 made their first contribution in #3738
- @kushanam made their first contribution in #4328
- @yuxinyuan made their first contribution in #4346
- @zyz111222 made their first contribution in #4283
- @jjuvonen-amd made their first contribution in #3763
- @henryj made their first contribution in #4373
Full Changelog: v0.22.0...v0.23.0rc1