🎉 Complete VibeVoice Integration Release

🆕 VibeVoice Engine - Now Fully Integrated!

This release marks the complete integration of Microsoft's VibeVoice engine into TTS Audio Suite, bringing professional-quality multilingual text-to-speech with advanced multi-speaker capabilities.

✨ What's New in VibeVoice

🎭 Dual Multi-Speaker Modes

Native Multi-Speaker Mode: Use VibeVoice's built-in 4-speaker system with "Speaker 1:", "Speaker 2:" format
Custom Character Switching: Full character voice management with unlimited speakers using your own voice references

📝 Complete SRT Subtitle Support

Full subtitle timing with all modes: stretch_to_fit, pad_with_silence, smart_natural, concatenate
Multi-character subtitle processing with proper timing
Seamless integration with existing SRT workflows

🤖 Two Model Options Available

vibevoice-1.5B (~5.4GB) - Faster inference, great quality
vibevoice-7B (~18GB) - Maximum quality, slower inference
Auto-download with HuggingFace integration and legacy path support

🧠 Smart Memory Management

Proper integration with ComfyUI's "Clear VRAM" button
Automatic model unloading when memory is low
Consistent architecture with other TTS engines

💡 VibeVoice Pro Tips

⚠️ Text Length Matters: VibeVoice works best with medium to long texts. Short phrases may not capture the voice reference quality well - aim for at least 2-3 sentences for optimal results.

🎵 Watch for Music Mode: VibeVoice has built-in music/podcast detection. Avoid starting text with greetings like "Hello!" or "Welcome!" as these may trigger a different speaking style than intended.

🎯 Best Practices:

Use complete sentences rather than short phrases
Provide context in your text for better voice matching
Test different text lengths to find the sweet spot for your voice references

🌍 Supported Languages

VibeVoice supports English and Chinese with high-quality synthesis for both languages.

📋 How to Use VibeVoice

Basic TTS: Use "TTS Text" node, select VibeVoice engine
SRT Subtitles: Use "TTS SRT" node with VibeVoice engine
Multi-Speaker: Choose between Native (4 speakers max) or Custom Character modes
Voice References: Add your own voice samples via Character Voices node

🔧 Full Engine Lineup

TTS Audio Suite now includes 5 complete TTS engines:

✅ ChatterBox - Fast, efficient TTS with voice conversion
✅ F5-TTS - Zero-shot voice cloning with reference audio
✅ Higgs Audio 2 - Professional voice cloning and synthesis
✅ VibeVoice - Multilingual TTS with multi-speaker support
✅ RVC Integration - Voice conversion post-processing

🐛 Bug Fixes & Improvements

Improve VibeVoice memory management and unloading
Better integration with ComfyUI's memory management system
More reliable model unloading when using 'Clear VRAM' button
Consistent architecture across all TTS engines for better maintainability
Enhanced stability when switching between different models

Full Changelog: https://github.com/diodiogod/TTS-Audio-Suite/blob/main/CHANGELOG.md

Download: Install via ComfyUI Manager or clone from GitHub
Documentation: Check the folder and example workflows
Support: Report issues on GitHub Issues page

diodiogod/TTS-Audio-Suite v4.6.16 TTS Audio Suite v4.6.16 - Complete VibeVoice Integration on GitHub