github diodiogod/TTS-Audio-Suite v4.6.16
TTS Audio Suite v4.6.16 - Complete VibeVoice Integration

latest releases: v5.3.0, v5.0.0, v4.27.0...
10 months ago

🎉 Complete VibeVoice Integration Release

🆕 VibeVoice Engine - Now Fully Integrated!

This release marks the complete integration of Microsoft's VibeVoice engine into TTS Audio Suite, bringing professional-quality multilingual text-to-speech with advanced multi-speaker capabilities.

✨ What's New in VibeVoice

🎭 Dual Multi-Speaker Modes

  • Native Multi-Speaker Mode: Use VibeVoice's built-in 4-speaker system with "Speaker 1:", "Speaker 2:" format
  • Custom Character Switching: Full character voice management with unlimited speakers using your own voice references

📝 Complete SRT Subtitle Support

  • Full subtitle timing with all modes: stretch_to_fit, pad_with_silence, smart_natural, concatenate
  • Multi-character subtitle processing with proper timing
  • Seamless integration with existing SRT workflows

🤖 Two Model Options Available

  • vibevoice-1.5B (~5.4GB) - Faster inference, great quality
  • vibevoice-7B (~18GB) - Maximum quality, slower inference
  • Auto-download with HuggingFace integration and legacy path support

🧠 Smart Memory Management

  • Proper integration with ComfyUI's "Clear VRAM" button
  • Automatic model unloading when memory is low
  • Consistent architecture with other TTS engines

💡 VibeVoice Pro Tips

⚠️ Text Length Matters: VibeVoice works best with medium to long texts. Short phrases may not capture the voice reference quality well - aim for at least 2-3 sentences for optimal results.

🎵 Watch for Music Mode: VibeVoice has built-in music/podcast detection. Avoid starting text with greetings like "Hello!" or "Welcome!" as these may trigger a different speaking style than intended.

🎯 Best Practices:

  • Use complete sentences rather than short phrases
  • Provide context in your text for better voice matching
  • Test different text lengths to find the sweet spot for your voice references

🌍 Supported Languages

VibeVoice supports English and Chinese with high-quality synthesis for both languages.

📋 How to Use VibeVoice

  1. Basic TTS: Use "TTS Text" node, select VibeVoice engine
  2. SRT Subtitles: Use "TTS SRT" node with VibeVoice engine
  3. Multi-Speaker: Choose between Native (4 speakers max) or Custom Character modes
  4. Voice References: Add your own voice samples via Character Voices node

🔧 Full Engine Lineup

TTS Audio Suite now includes 5 complete TTS engines:

  • ChatterBox - Fast, efficient TTS with voice conversion
  • F5-TTS - Zero-shot voice cloning with reference audio
  • Higgs Audio 2 - Professional voice cloning and synthesis
  • VibeVoice - Multilingual TTS with multi-speaker support
  • RVC Integration - Voice conversion post-processing

🐛 Bug Fixes & Improvements

  • Improve VibeVoice memory management and unloading
  • Better integration with ComfyUI's memory management system
  • More reliable model unloading when using 'Clear VRAM' button
  • Consistent architecture across all TTS engines for better maintainability
  • Enhanced stability when switching between different models

Full Changelog: https://github.com/diodiogod/TTS-Audio-Suite/blob/main/CHANGELOG.md

Download: Install via ComfyUI Manager or clone from GitHub
Documentation: Check the folder and example workflows
Support: Report issues on GitHub Issues page

Don't miss a new TTS-Audio-Suite release

NewReleases is sending notifications on new releases.