github tgeczy/TGSpeechBox v-299
TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Speech Dispatcher module version 2.99

latest releases: v-310b2, v-310b101, v-310b1...
one month ago

TGSpeechBox v2.99 — Release Notes

v2.99 is a prosody and quality release that brings major improvements to stress realization, pitch modeling, British English support, and the first non-European language pack. Every English-speaking user benefits from more natural rhythm, livelier intonation, and clearer compound words — whether reading documents, navigating interfaces, or skimming code at high speed.

Stress and Rhythm

Continuous duration scaling — Stressed and unstressed syllables now receive proportional acoustic contrast at every prominence level. The previous system used fixed brackets that created hidden cliffs where a tiny change in prominence caused a large jump in duration. The new linear interpolation means primary stress, secondary stress, and reduced vowels each sound distinct without abrupt transitions.

Monosyllable prominence floor — Single-syllable content words like "box", "top", and "lock" now receive near-full prominence even when eSpeak omits the stress mark. Function words ("the", "of", "in", "at") are excluded via accent-specific lists so they stay appropriately reduced.

Compound word stress dictionary — 353 new entries added to the English stress dictionary, covering productive compounds that were missing from the CMU Dict base: lockbox, checkbox, combobox, textbox, hotfix, screenshot, and hundreds more. The dictionary now contains over 110,000 entries. A defensive merge path in the text parser handles cases where eSpeak splits compound IPA that the input text sends as a single word.

Pitch Modeling

Impulse pitch mode rewrite — The impulse pitch contour has been rebuilt with a multi-layer additive architecture. Four independent pitch layers now contribute to the final contour: proportional declination across the clause, hat-pattern rise and fall around stressed words, count-based stress peaks on vowel nuclei, and terminal boundary gestures. Stressed words get a pitch rise at their onset and a fall at the next word boundary, creating the natural up-down movement that makes speech sound engaged rather than monotone. Terminal stress is inverted on statement-final syllables — pitch drops instead of boosting, giving declarative sentences a decisive ending. Long phrases no longer bottom out thanks to hat damping at word boundaries and a raised pitch floor.

Fujisaki pitch improvements — The Fujisaki pitch mode now uses a power-curve mapping from prominence to accent amplitude, giving better perceptual separation between primary and secondary stress. Monosyllable content words receive tighter, punchier accent timing, and compound word boundaries trigger micro-declination resets for more natural pitch contours within long words.

British English

en-gb overhaul — UK English now benefits from the same post-processing passes that have been improving US English over the past several releases. Boundary smoothing, trajectory limiting, cluster timing, prominence, microprosody, coarticulation, phrase-final lengthening, and rate compensation are all now active with RP-appropriate tuning. Key differences from US are preserved: no American flapping, no rhotic vowel coloring, stronger pre-fortis clipping, crisper consonant transitions, more conservative cluster timing, and yod fronting for words like "tune" and "new."

Affricate fix — A normalization rule was stripping the tie bar from /d͡ʒ/, causing the phoneme matcher to see /d/ and /ʒ/ as separate sounds. Words like "edge", "judge", and "bridge" sounded French. The rule has been removed and affricates now render correctly.

Turkish — New Language

Initial Turkish support — TGSpeechBox now speaks Turkish. The language pack includes 21 dedicated phoneme entries tuned to Turkish acoustic targets: 8 core vowels, 6 allophonic variants, 5 long vowels for yumuşak ge (ğ) compensation, and dental sibilant variants for Turkish /s/ and /z/. Prosody settings are tuned for Turkish syllable-timed rhythm with less stress contrast than English, unaspirated voiceless stops, word-final /z/ devoicing, and ğ-aware vowel lengthening so words like "dağ", "öğretmen", and "doğru" sound natural rather than clipped.

This is a first release and should be considered a starting point. We need Turkish-speaking testers — if you use NVDA in Turkish or know someone who does, please try the pack and share feedback on vowel quality, consonant clarity, and overall naturalness. Native speaker ears are essential for tuning a language pack beyond what acoustic research and eSpeak comparison can achieve on their own. Feedback can be sent through the project's issue tracker or community channels.

NVDA Compatibility Notice

Deprecation warning — v2.99 is the last release of the TGSpeechBox NVDA add-on to support NVDA 2023.x and 2024.x. Starting with v3.0, the add-on will require the current year's NVDA release. v2.99 will continue to work and will still receive language pack updates for existing languages for a while, but as the engine evolves, newer processing rules may not be available on older add-on versions. If you are still on NVDA 2023 or 2024, we recommend updating NVDA before upgrading to TGSpeechBox v3.0 when it arrives.

What's Next — v3.0 Sneak Peek 👀

v3.0 is going mobile. TGSpeechBox is on TestFlight for Mac and iOS reviews are ongoing for new builds. Beta APKs are available, and the PlayStore listing is open to closed testers. The roadmap includes:

  • Android TTS Service — a system-level speech engine that any Android screen reader or app can use, bringing formant synthesis to a platform that has never had it as an option.
  • iOS AU Speech Synthesis Provider — a proper Audio Unit extension for system-wide TTS on iPhone and iPad, with process separation for App Store compliance.
  • The case for choice — on Windows, blind users pick from half a dozen synth engines because the voice you spend all day with is deeply personal. On mobile, that choice barely exists. TGSpeechBox aims to change that: a lightweight, tunable, open-source formant synthesizer that stays crisp at most speech rates, available everywhere.

More details as development progresses. Feedback and testing from the community are always welcome.

Don't miss a new TGSpeechBox release

NewReleases is sending notifications on new releases.