TGSpeechBox v3.0-beta5
Changes since v3.0-beta4. 66 commits!
⚠️ Last build before CSUN 2026! The team will be at the conference next week — no new builds or updates for about a week. We'll be back at it after CSUN. If you're there, come say hi!
New Features
- Windows 7 compatibility (issue #45): All Windows binaries (SAPI, NVDA DLLs, phoneme editor, settings app) now run on Windows 7. Switched to static CRT and integrated YY-Thunks v1.1.9 which provides fallback implementations for Win8+ APIs the CRT uses internally. Binaries are fully self-contained — no MSVC redistributable needed. Thank you @aksel for reporting this.
- Head Size slider: New vocal tract length control with graduated formant scaling (F1×s^0.2 through F6×s^1.0) for realistic head-size variation. Available on NVDA, SAPI, iOS, and Android. David voice defaults to headSize 100 (large pharynx).
- Voice profiles on mobile: Beth (female) and Bobby (child) voices now available on iOS and Android, with dynamic discovery from phonemes.yaml.
- Pack settings editor on mobile: New Editor tab on iOS and Android lets you browse and override any language pack setting with live reload. Changes are stored as overrides and auto-removed when they match the pack default.
- Year splitting: 4-digit numbers are split into two 2-digit pairs ("1995" → "nineteen ninety-five"), with "oh" for leading-zero pairs ("1906" → "nineteen oh six"). Handles edge cases: X000 numbers go to eSpeak ("4000" → "four thousand"), 200X numbers stay whole. NVDA checkbox to toggle. Enabled for all English dialects.
- English date ordinals: "June 6" → "June 6th", "March 1" → "March 1st" — ordinal suffixes added before eSpeak phonemization. Wired through SAPI, NVDA, and mobile.
- Sonorant-context vowel protection: Unstressed vowels flanked by nasals, liquids, or semivowels get extra duration floor (8ms) and amplitude boost (1.15×). Fixes "animals" losing its middle syllable at rate 15+. Cross-linguistic — all 26 languages benefit.
- Cross-word allophone conditions: New rule conditions that look across word boundaries. Used to fix Spanish word-final /ɾ/ trilling before vowels (issue #42).
- en-gb stress dictionary: 2,005 entries derived from Reece H. Dunn's RP dictionary (BSD 2-clause).
- Vocal tract shape parameters: Three new VoicingTone fields — nasalBwScale, f4FreqScale, nasalGainScale — for finer voice character control.
Join Test Flight for Mac OS and iOS!
Join TestFlight here by clicking this link from your mobile device.
Bug Fixes
- SAPI clipping at 100% volume: Output gain boost was 1.95× at max volume — now 1.50×. Previously needed to set volume to 85% to avoid clipping.
- en-us.yaml indent broke parser: sonorantContextAmplitudeScale had wrong indentation, making everything after it invisible to the YAML parser. Result: half-British sounding US English.
- Word-final stop sounded like affricate (issue from Dane Stange): pa6 was untouched by the unreleased_word_final rule, leaving it as the dominant burst component with upward spectral tilt. /t/ sounded like /t͡ʃ/ ("eight" → "eitch"). Added pa6 rolloff to all 3 English dialects.
- Spanish word-final /ɾ/ double trill (issue #42): /ɾ/ before vowels across word boundaries was incorrectly trilled. Fixed with cross-word allophone conditions.
- iOS VoiceOver settings reset on restart: Multiple fixes — AU extension applied settings before voice/language (wiping them), host app wasn't flushing UserDefaults to disk, pitch mode lost on restart.
- Voice profile reset on language change: pack reload wiped voiceProfileName. Fixed in C frontend and NVDA Python driver.
- Function-word reduction broken: "for" and "or" were promoted like content words because the wordShape builder omitted length marks, breaking the entire long-vowel exclusion list.
- Year splitting edge cases: X000 numbers ("4000" → "forty oh zero"), 200X numbers, and 0X second pairs all had incorrect splitting behavior.
- 44 kHz aspiration thinness: White noise spectral density spreads thinner at higher sample rates. Added fourth-root amplitude scaling for aspiration noise.
- Diphthong micro-frame buffer overflow: Fixed buffer overrun and scaled micro-frame count with duration.
- TTS service crash on Android: Infinite recursion in applyCurrentVoice.
- Low-pitch choppiness: Fixed, along with David voice formant compression issue.
- Strip PUA-A codepoints: Stray Private Use Area codepoints in IPA input no longer corrupt allophone processing.
Language Pack Improvements
- GenAm vowel formant tuning: Conservative 50–70% moves toward Hillenbrand (1995) male speaker targets for /ɪ/, /æ/, /ɔ/, and /o/.
- /d/ frication reduced and /ɹ/ F3 lowered: Clearer stop quality and warmer R sound.
- Tap /ɾ/ frication softened: Reduced gritty quality in common words like "accessibility".
- RP MOUTH onset retuned and US THOUGHT voiceAmplitude adjusted.
- Hungarian word-final /t/ frication reduced: Was sounding like "cs".
Platform Improvements
- NVDA: Year splitting checkbox in settings, pitch mode added to settings ring.
- Android: Engine Settings voice picker decoupled from TalkBack active voice. Tab subtitles added.
- iOS: Dynamic voice profile discovery, editor auto-start, VoiceOver fixes, tab subtitles.
- All platforms: Cascade BW scale slider 50 now maps to 1.0 (was 0.9).