tgeczy/TGSpeechBox v-301 on GitHub

TGSpeechBox v3.01

47 commits since v3.0. Focused on consonant clarity, diphthong accuracy, DSP depth, and Spanish pronunciation quality.

DSP Engine (v8)

Higher cascade formants F7 (6500 Hz) and F8 (7500 Hz) for spectral presence above F6 (Rabiner 1968 defaults). Nyquist-proximity fade auto-mutes them at low sample rates.
Parallel F7/F8 resonators for fricative high-frequency presence (amp 0.15/0.07).
2x source oversampling for cleaner harmonics at lower sample rates.
Dual-oscillator chorus for vocal fold asymmetry (VoicingTone V5): chorus depth and variation sliders for natural vocal fold jitter.
Brown noise aspiration blend: white/brown noise mix by sample rate (44k=100% white, 22k=70/30, 16k=50/50). Fixes aspiration thinness at lower sample rates.
Glottal sharpness rebalanced for F7/F8 spectral dynamics.

Per-Parameter Envelope Timing

New DSP architecture for consonant-to-vowel transitions. Instead of a single crossfade ratio, each source type (voicing, frication) transitions on its own schedule:

transSourceHoldRatio: delays noise source fadeout during crossfades, keeping frication audible longer.
transVoicingHoldRatio: delays voicing onset, creating temporal separation between consonant release and vowel onset.
Combined effect: 3-phase envelope (pure release, overlap, transition to vowel). All ratios scale proportionally with speech rate.
FrameEx extended to 29 fields (was 23). All consumers synced: NVDA, SAPI, iOS, Android, Linux, phoneme editor.

Consonant Clarity

Stop burst locus-theory F2/F3 blending: burst frequencies shift toward following vowel (velars 55%, alveolars 30%, labials 20%).
Transition bandwidth widening after obstruents: decaying BW boost across first 3 micro-frames, scaled by F2 sweep width.
Word-final stop audibility improved: stop floor 4ms to 6ms, word-final bonus 5ms to 8ms, voiced alveolar softening eased.
Voiced consonant bandwidths widened for smoother transitions.

Diphthong Improvements

Rhotic diphthong collapse: vowel+liquid tied pairs (/ER/ in "shared", "scared", "stairs") now collapse into micro-frame diphthongs with proper formant sweep. Previously rendered as two crushed tokens with no glide.
SQUARE diphthong pairScale 1.40 for the wide F3 sweep (2500 to 1620 Hz).
Diphthong rate compensation bumped 0.15 to 0.30; onset hold exponent 1.3 to 1.6 for better vowel identity at high rates.

Spanish Pronunciation (issues #81, #82)

Fixed diphthong tie bar blocking dialect replacements: eSpeak marks diphthongs with tie bars (e-tied-I) which blocked I-to-I_es substitution. Vowel-vowel tie bars are now stripped early; affricate tie bars preserved. (issue #82, reported by @gregodejesus2)
New I_es parallel formants tuned to Spanish-peripheral [i] target (Quilis 1981, Martinez Celdran): pf1 330, pf2 2150, pf3 2850. Previously carried English-lax values.
New U_es phoneme for Spanish [u] diphthong offglides (/au/, /eu/): F1=340, F2=780. Base /U/ was English-lax (F1=405, F2=900).
autoDiphthongOffglideToSemivowel: false for Spanish to preserve dialect-specific offglide phonemes.
Geminate /ss/ deduplication from eSpeak number expansion ("200" no longer "dosssientos").
Square wave trill gating: full closure (was 22% residual), 50/50 duty cycle (was 72/28), 0.25ms edges (was 2ms). "Pero" vs "perro" contrast is now clear. (issue #84, proposed by @LeonardoBlancoSerna)

22050 Hz Quality Fix

Fixed "behind a wall" quality at 22050 Hz: disabled oversampling decimation and anti-alias LP at 22050+ Hz. Sharpness raised 2.0 to 3.0. This was the most-reported audio quality issue since v3.0.

Text Processing

German ordinal dot clause split fixed on all platforms ("3. Mai" no longer splits).
Thousands separator comma fixed for Hungarian/European locales.
LOT-to-THOUGHT toggle fixed for -og derivatives (froggy, frogs, doggo).
Year splitting default set to false for non-English languages.
Smart quote apostrophe added to dictSuffixes for right-tick possessives.
Dict suffix stripping: inflected forms now inherit stem dictionary entries ("running" finds "run").
dictSuffixes added to all 20 language packs.

Linux / Speech Dispatcher

sd_tgsb: config file support for voicing tone, FrameEx, and pitch mode settings. Per-user config at ~/.config/tgspeechbox/sd_tgsb.conf; system-wide (for lightdm/root) at /etc/speech-dispatcher/modules/tgsb-native.conf.
sd_tgsb: YAML voice profile discovery (Beth, Bobby, user-defined profiles).
sd_tgsb: fixed rate/pitch/volume mapping, CHAR/KEY commands, shared read buffer.
sd_tgsb: fixed voice switching crash and unsafe pointer arithmetic.
sd_tgsb: fixed pitch mapping (SD sends -100..+100, not 0..100).
sd_tgsb: filter root language tags from voice list.
sd_tgsb: moved from android/ to platforms/speechd/ (correct source tree location).
sd_tgsb: license clarified as GPL-3.0 (requires espeak-ng); rest of project remains MIT.
Documentation updates for config files and dictionary editing.

NVDA

Chorus depth/variation stored per-voice (like other tone sliders).
Fix ordinal dot clause split in text_utils.py.

iOS

Fix chorus sliders not applying.
Strip thousands commas before sending to eSpeak.

Pronunciation Dictionaries

New entries: Hormuz, Houthi, Iran, Kamala, aria, donkey, Costco.
en-us dictionary cleaned up.

Documentation

Tuning.md: added per-parameter envelope timing (source hold + voicing hold) config, tuning guidance, and F7/F8 field reference.
5 boundary smoothing hold settings synced to all consumers with localization (Spanish, Hungarian, Portuguese).

Thank You

@gregodejesus2 — for reporting Spanish diphthong and number pronunciation issues (#81, #82) with clear reproduction steps across all platforms.
@LeonardoBlancoSerna — for the trill gating proposal (#84) that led to the square wave modulation redesign. Your amplitude gating idea was exactly right.
@yaresDg — for the comprehensive audio recording on #81 testing "version" across dialects and speech rates, and for responsibly flagging the /rs/ regression rather than letting us ship a risky fix.
@fastfinge — for continued dictionary contributions and help.

Join the Test

Want to help test future updates before they ship?

tgeczy/TGSpeechBox v-301
TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Linux, Android, iOS, Mac OS, version 3.01

on GitHub

DSP Engine (v8)

Per-Parameter Envelope Timing

Consonant Clarity

Diphthong Improvements

Spanish Pronunciation (issues #81, #82)

22050 Hz Quality Fix

Text Processing

Linux / Speech Dispatcher

NVDA

iOS

Pronunciation Dictionaries

Documentation

Thank You

Links

Join the Test

tgeczy/TGSpeechBox v-301 TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Linux, Android, iOS, Mac OS, version 3.01 on GitHub

DSP Engine (v8)

Per-Parameter Envelope Timing

Consonant Clarity

Diphthong Improvements

Spanish Pronunciation (issues #81, #82)

22050 Hz Quality Fix

Text Processing

Linux / Speech Dispatcher

NVDA

iOS

Pronunciation Dictionaries

Documentation

Thank You

Links

Join the Test

tgeczy/TGSpeechBox v-301
TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Linux, Android, iOS, Mac OS, version 3.01

on GitHub