TGSpeechBox v3.0-beta4
Changes since v3.0-beta3. 43 commits later!
New Features
- Compound word splitting: 3,686 English compound words (butterfly, postman, lighthouse, etc.) are now split before eSpeak phonemization and merged back in the text parser. This gives eSpeak individual words it knows how to pronounce — "popcorn" split to "pop corn" gets correct vowel quality on both halves. The merge step uses ASCII Unit Separator (
\x1F) tracking to distinguish compounds we split from user-written separate words: "butterfly" merges back into one prosodic word, but "butter fly" keeps its word boundary. Works on all platforms (NVDA, SAPI, Android, iOS). - YAML-based number expansion: Numbers are now expanded to words using language-specific YAML rules in the text parser, so "24" becomes "twenty four" matching eSpeak's two IPA words. This fixes stress alignment for utterances containing numbers.
- Word boundary amplitude dips: Brief reduced-amplitude micro-frames at word onsets (6ms, 55% depth for English) improve inter-word clarity in connected speech. "Lighthouse keeper" vs "light housekeeper" now have perceptibly different prosody.
- Impulse pitch improvements: Progressive terminal gesture with clause differentiation — declaratives get falling pitch contour, questions rise, exclamations get a 10% register boost. Single-letter words no longer get a false terminal pitch peak.
- Staircase diphthong micro-frames: Diphthong formant sweeps now use fixed N=5 staircase frames with
endCf=NAN(disabling per-sample smoothing) and 3ms snap fades. This eliminates the long-standing diphthong shimmer caused by IIR resonators being driven by continuously-changing formant targets.
Join Test Flight for Mac OS and iOS!
Join TestFlight here by clicking this link from your mobile device.
Bug Fixes
- Compound merge for languages without stress dict (en-gb): Both
nvspFrontend.cppandtext_parser.cppgated the entire text parser on having a stress dictionary. Languages like en-gb that have compound maps but no stress dict never ran compound IPA merge — "postman" and "post man" sounded identical. Now the text parser runs whenever either a stress dict or compound map is available. - Cascade corruption in growth replacements: Commit cf072ee's PUA over-protection fix left position 0 of growth replacements unprotected, so
s→s_esfollowed bys→s_mxin Mexican Spanish produced garbled "ce-ese" for the letter C. Reverted to protect ALL positions of growth replacements; cross-phase visibility is handled separately by PUA-A escaping. - iOS clause boundary leaking (issue #40): eSpeak's opaque clause chunking caused numbers at end of text to bleed into the next VoiceOver utterance ("Unread: 1" → "1" heard with next phrase). iOS bridge now pre-splits text at punctuation boundaries like Android and NVDA, with colon/semicolon guard (only split when followed by whitespace, so "5:44" stays intact). Thank you @fastfinge for this issue find.
- PUA over-protection hiding allophones: Growth replacements like
fɔːɹ→fɔːᵊɹwere PUA-escaping inherited characters (theɔat position 1), blocking downstream allophone rules likeɔ→ᴐ. Fix: only protect genuinely new growth positions. - Voiceless fricative onset swallowed at high rates: /s/ in "star" was too short at rate 100 because boundary smoothing's aspiration-dominant bypass didn't check
fricationAmplitude. AddedcurFricationDominantcheck. - Rate-dependent diphthong flattening: MOUTH onset bandwidths were too narrow, causing clipping on mobile speakers. Widened to Hillenbrand (1995) values.
- Diphthong duration scale context guard:
diphthongDurationScalewas smearing bare diphthongs ("I", "out"). Now only applies when consonants are present on both sides within the same word. - Crash on trillion+ numbers:
expandNumber()could crash on extremely large numbers. Added bounds checking. - Android language restore after process kill: Language setting was lost when Android killed and restarted the TTS service process.
- Platform output gain before limiter: Moved gain application into the DSP layer (before the limiter) so clipping is caught consistently across all platforms.
- Sustained frication cutoff at 44.1 kHz: Lowered from 10 kHz to 8 kHz to reduce harsh high-frequency energy at the highest sample rate.
- ʊ diphthong offglide clipping: Widened ʊ offglide bandwidths (cb1=100, cb2=100, cb3=180) to reduce clipping during diphthong offglides.
Language Pack Improvements
- Spanish tuning round 2 (thanks Mateo / @rmcpantoja — PR #35): Fixes for /r/ pronunciation, diphthong corrections, and ñ workaround for eSpeak. Demonstrates the power of the YAML rule engine for community-driven language tuning.
- Mexican Spanish skipReplacements: es-mx.yaml now properly overrides parent
s→s_eswiths→s_mxvia skipReplacements, preventing double-substitution. - RP MOUTH onset tuning: New
åphoneme with F2=1650 (200 Hz above GenAm) gives RP its characteristic fronted-but-rounded onset quality. eSpeak produces identical /aʊ/ for both en-gb and en — the å/ä distinction is our compensation. voiceAmplitude lowered to 0.50 and cb1 widened to 200 to prevent onset clipping. - GenAm MOUTH onset: New
äphoneme (F1=730, F2=1450) based on Hillenbrand (1995) male data, replacing the old onset that was only 30 Hz from schwa. - en-gb GOAT and MOUTH diphthong retuning: Restored diphthong glides, adjusted onset hold exponents.
- en-us GOAT /oʊ/ diphthong glide restoration: Reversed over-aggressive monophthongization, reported on issue #38.
- en-us lengthenedScale: Tuned 1.35→1.28 for more natural tense vowel duration.
- GenAm -og words: LOT→THOUGHT vowel shift (ɑːɡ→ɔːɡ) for words like "dog", "log", "fog".
- RP LOT /ɒ/ widened bandwidths: Lower amplitude for better mobile speaker performance.
- Polish phoneme tuning: Fixed ɨ formants, added nasal vowel phonemes. Thank you Spacepup and @pitermach for helping me start some Polish tuning work.
- English (AU): Added to Android and iOS language dropdowns.
Platform Improvements
- iOS clause pre-splitting: Matches the Android/NVDA pattern — explicit punctuation scanning with per-clause compound splitting and combined IPA accumulation. Fixes issue #40.
- Android battery optimization exemption: TTS service requests exemption to prevent the OS from killing it during long reads.
- Licensing info updated: Clarified MIT source / GPL3 binary distinction based on community feedback.