VoxCPM v2.0.3
This release focuses on fine-tuning usability, runtime stability, safer LoRA loading, and faster streaming inference.
Highlights
- Added
voxcpm validatefor pre-flight JSONL training manifest validation. - Added optional
ref_audiosupport in the fine-tuning data pipeline. - Improved runtime device handling with explicit
--devicesupport and safer MPS dtype behavior. - Improved VoxCPM2 streaming VAE decoding by avoiding redundant overlap decoding.
- Hardened legacy LoRA checkpoint loading with
weights_only=True. - Fixed LoRA rank mismatch handling in
lora_ft_webui.py.
New Features
- Add
voxcpm validate --manifest train.jsonlto catch training data issues before fine-tuning.- Validates JSONL format, required
text/audiofields, audio existence/readability, sample rate, duration stats, text length stats, and optionalref_audio.
- Validates JSONL format, required
- Add optional
ref_audiosupport for fine-tuning manifests.- Training packing now supports
[103, ref_audio, 104, text, 101, target_audio, 102]. - Loss is applied only to the target audio segment.
- Training packing now supports
- Add
--deviceCLI argument for model inference commands.- Supports
auto,cpu,mps,cuda, and indexed CUDA devices such ascuda:0.
- Supports
Performance
- Improve VoxCPM2 streaming VAE decode with a stateful
StreamingVAEDecoder.- Streaming decode now processes only the newest latent patch and carries causal convolution state internally.
- This removes redundant overlap decoding and reduces streaming VAE decode overhead.
Fixes
- Fix CUDA Graph dynamic-shape accumulation by using the uncompiled feature encoder for prefill.
- Fix CPU SDPA attention mask broadcasting by using an explicit broadcastable mask shape.
- Fix non-string text validation order to raise the intended
ValueErrorinstead ofAttributeError. - Fix file descriptor leaks when loading
config.jsonin local model loaders. - Fix MPS audio quality issues by promoting low-precision dtypes to
float32on Apple Silicon by default. - Fix
VOXCPM_MPS_DTYPEoverride validation to match supported dtype aliases. - Fix LoRA rank mismatch in
lora_ft_webui.pyby reloading the model when checkpoint rank differs. - Fix Web Demo control text handling by stripping parentheses before constructing the model prompt.
Security
- Legacy LoRA
.ckpt/.pthloading now usestorch.load(..., weights_only=True). - This reduces the risk of arbitrary pickle payload execution while preserving tensor-only checkpoint compatibility.
Documentation
- Document vLLM-Omni as a production serving option for VoxCPM2.
- Update Web Demo usage to
python app.py --port 8808. - Update ModelScope local download example.
- Clarify Python requirement as
>=3.10,<3.13. - Add ComfyUI_RH_VoxCPM to the ecosystem list.
Tests
- Added coverage for training manifest validation, including sample-rate mismatch, missing audio, relative paths,
ref_audio, and CLI exit codes. - Added runtime device selection tests.
- Added LoRA checkpoint safety tests for tensor-only checkpoints and malicious pickle payloads.
- Added CLI tests for
--devicedefaults and argument forwarding.
Contributors
Thanks to the contributors included in this release:
Full Changelog: 2.0.2...2.0.3