This release includes all cumulative updates and fixes from the v0.5.3 beta series (beta1 – beta2).
New Features
- Built-in Reference Audio: Automatically imports 3 preset TTS voices on first install — experience voice cloning instantly without recording.
- Short Text Pronunciation Fix: Automatically appends a period to text fragments not ending with sentence-final punctuation, reducing abrupt cutoffs at the end of short phrases.
- Multi-select ZIP Import: The file picker for voice import now supports multi-select mode, allowing batch import of multiple ZIP files at once.
- Selective Overwrite Import: Overwrite import now displays a voice list for the user to choose from, instead of blindly overwriting everything.
- Import File Validation: Format validation added when creating a new voice (audio formats only); reference audio capped at 30 seconds. ZIP imports are validated for content structure, with clear error messages for invalid files.
UI & Interaction Improvements
- Drag-to-Reorder Overhaul: Drag-and-drop sorting for voice lists and replacement rule lists has been fully rewritten, completely eliminating edge-case jitter.
- Smoother Scrolling: Fixed an issue where the entire Voice Tab scrolled in unison.
- Benchmark Button Optimization: Engine initialization moved to a background thread — first tap no longer causes a UI freeze.
Bug Fixes & Core Changes
- System TTS Playback Artifact Fix: Removed redundant runtime RNNoise denoising, eliminating the audio glitch at the beginning of playback caused by GRU initialization artifacts.
- Dual-Engine Consolidation: Voice preview and System TTS / Benchmark now share a single engine instance, saving ~150 MB of memory.
- "Restore Original Audio" Button Fix: Added applied-state tracking so the restore button only appears after audio has actually been processed.
- Overwrite Import Fix: Fixed an issue where importing multiple files via overwrite would overwrite each other.
该版本包含了 v0.5.3 测试系列(beta1 - beta2)的累计更新内容与修复。
新增功能
- 内置参考音频:首次安装自动导入 3 个 TTS 预设音色,无需录音即可立即体验语音克隆效果。
- 短文本发音优化:对未以句末标点结尾的文本片段,送入模型前自动追加句号,改善短句末尾截断或发音急促的问题。
- 多选 ZIP 导入:音色导入文件选择器支持多选模式,一次可选多个 ZIP 文件批量导入。
- 覆盖导入支持选择性导入:覆盖导入现在也会先展示音色列表供用户选择,而非直接全量覆盖。
- 导入文件格式校验:新建音色时增加格式校验(仅允许音频格式),参考音频时长上限 30 秒;ZIP 包导入时校验内容结构,无效文件直接提示。
交互与界面优化
- 拖拽排序重构:音色列表和替换规则列表的拖拽排序全面重构,彻底消除了边缘抖动问题。
- 滚动流畅度提升:修复 Voice Tab 整页联动滚动问题。
- 性能测试按钮优化:引擎初始化移入后台线程,首次点击不再卡顿。
Bug 与核心修复
- 系统 TTS 朗读开头异声修复:移除运行时重复执行的 RNNoise 降噪处理,消除 GRU 初始化伪影导致的开头杂音。
- 双引擎架构合并:音色试听与系统 TTS/性能测试共享同一引擎实例,节省约 150MB 内存占用。
- 「恢复原始音频」按钮逻辑修复:新增已应用状态追踪,仅在音频实际经过加工后才显示恢复按钮。
- 覆盖导入修复:修复多文件覆盖导入互相覆盖的问题。