🎉 Complete VibeVoice Integration Release
🆕 VibeVoice Engine - Now Fully Integrated!
This release marks the complete integration of Microsoft's VibeVoice engine into TTS Audio Suite, bringing professional-quality multilingual text-to-speech with advanced multi-speaker capabilities.
✨ What's New in VibeVoice
🎭 Dual Multi-Speaker Modes
- Native Multi-Speaker Mode: Use VibeVoice's built-in 4-speaker system with "Speaker 1:", "Speaker 2:" format
- Custom Character Switching: Full character voice management with unlimited speakers using your own voice references
📝 Complete SRT Subtitle Support
- Full subtitle timing with all modes: stretch_to_fit, pad_with_silence, smart_natural, concatenate
- Multi-character subtitle processing with proper timing
- Seamless integration with existing SRT workflows
🤖 Two Model Options Available
- vibevoice-1.5B (~5.4GB) - Faster inference, great quality
- vibevoice-7B (~18GB) - Maximum quality, slower inference
- Auto-download with HuggingFace integration and legacy path support
🧠 Smart Memory Management
- Proper integration with ComfyUI's "Clear VRAM" button
- Automatic model unloading when memory is low
- Consistent architecture with other TTS engines
💡 VibeVoice Pro Tips
⚠️ Text Length Matters: VibeVoice works best with medium to long texts. Short phrases may not capture the voice reference quality well - aim for at least 2-3 sentences for optimal results.
🎵 Watch for Music Mode: VibeVoice has built-in music/podcast detection. Avoid starting text with greetings like "Hello!" or "Welcome!" as these may trigger a different speaking style than intended.
🎯 Best Practices:
- Use complete sentences rather than short phrases
- Provide context in your text for better voice matching
- Test different text lengths to find the sweet spot for your voice references
🌍 Supported Languages
VibeVoice supports English and Chinese with high-quality synthesis for both languages.
📋 How to Use VibeVoice
- Basic TTS: Use "TTS Text" node, select VibeVoice engine
- SRT Subtitles: Use "TTS SRT" node with VibeVoice engine
- Multi-Speaker: Choose between Native (4 speakers max) or Custom Character modes
- Voice References: Add your own voice samples via Character Voices node
🔧 Full Engine Lineup
TTS Audio Suite now includes 5 complete TTS engines:
- ✅ ChatterBox - Fast, efficient TTS with voice conversion
- ✅ F5-TTS - Zero-shot voice cloning with reference audio
- ✅ Higgs Audio 2 - Professional voice cloning and synthesis
- ✅ VibeVoice - Multilingual TTS with multi-speaker support
- ✅ RVC Integration - Voice conversion post-processing
🐛 Bug Fixes & Improvements
- Improve VibeVoice memory management and unloading
- Better integration with ComfyUI's memory management system
- More reliable model unloading when using 'Clear VRAM' button
- Consistent architecture across all TTS engines for better maintainability
- Enhanced stability when switching between different models
Full Changelog: https://github.com/diodiogod/TTS-Audio-Suite/blob/main/CHANGELOG.md
Download: Install via ComfyUI Manager or clone from GitHub
Documentation: Check the folder and example workflows
Support: Report issues on GitHub Issues page