TTS Audio Suite v4.15.0
🎉 Major New Features
⚙️ Step Audio EditX TTS Engine
A powerful new AI-powered text-to-speech engine with zero-shot voice cloning:
- Clone any voice from just 3-10 seconds of audio
- Natural-sounding speech generation
- Memory-efficient with int4/int8 quantization options (uses less VRAM)
- Character switching and per-segment parameter support
🎨 Step Audio EditX Audio Editor
Transform any TTS engine's output with AI-powered audio editing (post-processing):
- 14 emotions: happy, sad, angry, surprised, fearful, disgusted, contempt, neutral, etc.
- 32 speaking styles: whisper, serious, child, elderly, neutral, and more
- Speed control: make speech faster or slower
- 10 paralinguistic effects: laughter, breathing, sigh, gasp, crying, sniff, cough, yawn, scream, moan
- Audio cleanup: denoise and voice activity detection
- Universal compatibility: Works with audio from ANY TTS engine (ChatterBox, F5-TTS, Higgs Audio, VibeVoice)
🏷️ Universal Inline Edit Tags
Add audio effects directly in your text across all TTS engines:
- Easy syntax:
"Hello <Laughter> this is amazing!" - Works everywhere: Compatible with all TTS engines using Step Audio EditX post-processing
- Multiple tag types:
<emotion>,<style>,<speed>, and paralinguistic effects - Control intensity:
<Laughter:2>for stronger effect,<Laughter:3>for maximum - Voice restoration:
<restore>tag to return to original voice after edits - 📖 Read the complete Inline Edit Tags guide
📝 Multiline TTS Tag Editor Enhancements
- New tabbed interface for inline edit tag controls
- Quick-insert buttons for emotions, styles, and effects
- Better copy/paste compatibility with ComfyUI v0.3.75+
- Improved syntax highlighting and text formatting
📦 New Example Workflows
- Step Audio EditX Integration - Basic TTS usage examples
- Audio Editor + Inline Edit Tags - Advanced editing demonstrations
- Updated Voice Cleaning workflow with Step Audio EditX denoise option
🔧 Improvements
- Better memory management and model caching across all engines