TGSpeechBox v3.01
47 commits since v3.0. Focused on consonant clarity, diphthong accuracy, DSP depth, and Spanish pronunciation quality.
DSP Engine (v8)
- Higher cascade formants F7 (6500 Hz) and F8 (7500 Hz) for spectral presence above F6 (Rabiner 1968 defaults). Nyquist-proximity fade auto-mutes them at low sample rates.
- Parallel F7/F8 resonators for fricative high-frequency presence (amp 0.15/0.07).
- 2x source oversampling for cleaner harmonics at lower sample rates.
- Dual-oscillator chorus for vocal fold asymmetry (VoicingTone V5): chorus depth and variation sliders for natural vocal fold jitter.
- Brown noise aspiration blend: white/brown noise mix by sample rate (44k=100% white, 22k=70/30, 16k=50/50). Fixes aspiration thinness at lower sample rates.
- Glottal sharpness rebalanced for F7/F8 spectral dynamics.
Per-Parameter Envelope Timing
New DSP architecture for consonant-to-vowel transitions. Instead of a single crossfade ratio, each source type (voicing, frication) transitions on its own schedule:
transSourceHoldRatio: delays noise source fadeout during crossfades, keeping frication audible longer.transVoicingHoldRatio: delays voicing onset, creating temporal separation between consonant release and vowel onset.- Combined effect: 3-phase envelope (pure release, overlap, transition to vowel). All ratios scale proportionally with speech rate.
- FrameEx extended to 29 fields (was 23). All consumers synced: NVDA, SAPI, iOS, Android, Linux, phoneme editor.
Consonant Clarity
- Stop burst locus-theory F2/F3 blending: burst frequencies shift toward following vowel (velars 55%, alveolars 30%, labials 20%).
- Transition bandwidth widening after obstruents: decaying BW boost across first 3 micro-frames, scaled by F2 sweep width.
- Word-final stop audibility improved: stop floor 4ms to 6ms, word-final bonus 5ms to 8ms, voiced alveolar softening eased.
- Voiced consonant bandwidths widened for smoother transitions.
Diphthong Improvements
- Rhotic diphthong collapse: vowel+liquid tied pairs (/ER/ in "shared", "scared", "stairs") now collapse into micro-frame diphthongs with proper formant sweep. Previously rendered as two crushed tokens with no glide.
- SQUARE diphthong pairScale 1.40 for the wide F3 sweep (2500 to 1620 Hz).
- Diphthong rate compensation bumped 0.15 to 0.30; onset hold exponent 1.3 to 1.6 for better vowel identity at high rates.
Spanish Pronunciation (issues #81, #82)
- Fixed diphthong tie bar blocking dialect replacements: eSpeak marks diphthongs with tie bars (e-tied-I) which blocked I-to-I_es substitution. Vowel-vowel tie bars are now stripped early; affricate tie bars preserved. (issue #82, reported by @gregodejesus2)
- New
I_esparallel formants tuned to Spanish-peripheral [i] target (Quilis 1981, Martinez Celdran): pf1 330, pf2 2150, pf3 2850. Previously carried English-lax values. - New
U_esphoneme for Spanish [u] diphthong offglides (/au/, /eu/): F1=340, F2=780. Base /U/ was English-lax (F1=405, F2=900). autoDiphthongOffglideToSemivowel: falsefor Spanish to preserve dialect-specific offglide phonemes.- Geminate /ss/ deduplication from eSpeak number expansion ("200" no longer "dosssientos").
- Square wave trill gating: full closure (was 22% residual), 50/50 duty cycle (was 72/28), 0.25ms edges (was 2ms). "Pero" vs "perro" contrast is now clear. (issue #84, proposed by @LeonardoBlancoSerna)
22050 Hz Quality Fix
- Fixed "behind a wall" quality at 22050 Hz: disabled oversampling decimation and anti-alias LP at 22050+ Hz. Sharpness raised 2.0 to 3.0. This was the most-reported audio quality issue since v3.0.
Text Processing
- German ordinal dot clause split fixed on all platforms ("3. Mai" no longer splits).
- Thousands separator comma fixed for Hungarian/European locales.
- LOT-to-THOUGHT toggle fixed for -og derivatives (froggy, frogs, doggo).
- Year splitting default set to false for non-English languages.
- Smart quote apostrophe added to dictSuffixes for right-tick possessives.
- Dict suffix stripping: inflected forms now inherit stem dictionary entries ("running" finds "run").
- dictSuffixes added to all 20 language packs.
Linux / Speech Dispatcher
- sd_tgsb: config file support for voicing tone, FrameEx, and pitch mode settings. Per-user config at
~/.config/tgspeechbox/sd_tgsb.conf; system-wide (for lightdm/root) at/etc/speech-dispatcher/modules/tgsb-native.conf. - sd_tgsb: YAML voice profile discovery (Beth, Bobby, user-defined profiles).
- sd_tgsb: fixed rate/pitch/volume mapping, CHAR/KEY commands, shared read buffer.
- sd_tgsb: fixed voice switching crash and unsafe pointer arithmetic.
- sd_tgsb: fixed pitch mapping (SD sends -100..+100, not 0..100).
- sd_tgsb: filter root language tags from voice list.
- sd_tgsb: moved from android/ to platforms/speechd/ (correct source tree location).
- sd_tgsb: license clarified as GPL-3.0 (requires espeak-ng); rest of project remains MIT.
- Documentation updates for config files and dictionary editing.
NVDA
- Chorus depth/variation stored per-voice (like other tone sliders).
- Fix ordinal dot clause split in text_utils.py.
iOS
- Fix chorus sliders not applying.
- Strip thousands commas before sending to eSpeak.
Pronunciation Dictionaries
- New entries: Hormuz, Houthi, Iran, Kamala, aria, donkey, Costco.
- en-us dictionary cleaned up.
Documentation
- Tuning.md: added per-parameter envelope timing (source hold + voicing hold) config, tuning guidance, and F7/F8 field reference.
- 5 boundary smoothing hold settings synced to all consumers with localization (Spanish, Hungarian, Portuguese).
Thank You
- @gregodejesus2 — for reporting Spanish diphthong and number pronunciation issues (#81, #82) with clear reproduction steps across all platforms.
- @LeonardoBlancoSerna — for the trill gating proposal (#84) that led to the square wave modulation redesign. Your amplitude gating idea was exactly right.
- @yaresDg — for the comprehensive audio recording on #81 testing "version" across dialects and speech rates, and for responsibly flagging the /rs/ regression rather than letting us ship a risky fix.
- @fastfinge — for continued dictionary contributions and help.
Links
- Android (Google Play)
- iOS / macOS (App Store)
- GitHub
- License: MIT (sd_tgsb module: GPL-3.0)
Join the Test
Want to help test future updates before they ship?