tgeczy/TGSpeechBox v-300b6 on GitHub

TGSpeechBox v3.0-beta6

Changes since v3.0-beta5. 57 commits!

New Features

Phoneme editor on mobile: New Phonemes tab alongside the existing Packs editor on both iOS and Android. Browse all phonemes for the current language, tap into a detail view with sliders for every formant, bandwidth, and amplitude field. Live preview button plays the phoneme through the DSP engine directly — bypassing allophone normalization so you hear exactly what you're editing. Fine-grained slider steps (100 Hz for formants, 25 Hz for bandwidths, 0.1 for amplitudes), edit icon for exact numeric input, per-phoneme reset and reset-all with confirmation dialog. Fully accessible with TalkBack and VoiceOver.
Pack YAML import/export: Share your language pack overrides as YAML files from the mobile editor — useful for backing up custom tuning or sharing with others.
Canadian English (en-ca): New dialect with Canadian Raising (PRICE/MOUTH onset raised before voiceless consonants), COT-CAUGHT merger, and GOAT backing. Mapped to en-us eSpeak voice on all platforms — was incorrectly falling back to British English.
SAPI JAWS support (issue #45): JAWS inserts a bookmark between every word, creating separate synthesis fragments. The engine now batches consecutive fragments into a single utterance for natural connected speech, and fires bookmark events at proportional positions during playback. Tested with JAWS 2026 on Windows 11. JAWS 18 on Windows 7 remains under investigation — see note below.
Text parser improvements: Time expressions ("6:03" → "six oh three"), hyphenated number separation for year splitting, ordinal suffix scope tightened to month context only (fixes false positives like "column 2 March"), and NVDA-expanded punctuation handled correctly ("6 colon 03").
Generic data query API (ABI v5): New queryData/setData/getDataCount functions replace the old YAML override path. All mobile consumers migrated. This is the foundation for the phoneme editor and future dictionary editing.
iOS voice preview: Tapping a voice in iOS Settings now plays "Hello, this is [Voice]" so you can audition voices without leaving the picker.
Dropdown selection for enum settings: Pack settings that have a fixed set of values (like pitch mode) now show as dropdown menus on both mobile platforms instead of raw number sliders.
Smaller NVDA addon: New build option excludes Windows 7 compatibility shims from NVDA DLLs, producing 46% smaller binaries. Win7 users use the SAPI build; NVDA addon targets modern Windows.

Join Test Flight for Mac OS and iOS!

Join TestFlight here by clicking this link from your mobile device.

Bug Fixes

Win7 SAPI freeze (issue #45): Write() returns 0 bytes on Windows 7 when the audio buffer is full. The old code spun in a tight loop consuming 100% CPU. Fixed with flow-controlled retry and proper abort handling so speech can be interrupted normally.
SAPI bookmark indexing (issue #45): Bookmark events were missing the numeric ID (wParam), so NVDA and JAWS couldn't track speech progress — the screen reader would speak the first item but never advance to the next. Fixed.
SAPI startup slowness (issue #45): Full engine initialization ran for every registered voice during SAPI enumeration (~1 second each, 26 voices = 26 seconds). Now cached — first voice does full init, the rest reuse it instantly.
Affricate word-boundary collision: Word-final affricate frication (e.g. /dʒ/ in "image") collided with the following consonant onset, causing a pop. Added micro-silence gap before word-initial consonants after affricates.
Number expansion misalignment: Numbers like "6402" in running text expanded to multiple IPA words but counted as one text word, shifting all subsequent stress assignments. "Volume" was reading as "velume". Fixed with chunk splitting on merged IPA.
Semivowel glides swallowed at high rates: Glide phonemes (/w/, /j/) were being starved of duration at fast speech rates. Added 30ms floor and boundary smoothing exemption.
en-gb PRICE vowel: Was using semivowel /j/ as offglide instead of vowel /ɪ/. Reverted from compound phoneme to collapse pair with proper RP onset.
Compound diphthong detection: Prosody and rate compensation passes didn't recognize compound-form diphthongs, applying wrong duration and prominence.
Uppercase AM expansion: "AM" now expands to "A.M." so eSpeak reads it as letter names instead of the word "am".

Language Pack Improvements

Tap /ɾ/ coarticulation: Research-backed formant targets from Cathcart (2012) and De Jong (2011) — F3 dip to 2200 Hz (alveolar place cue), F4 shift up 500 Hz, sharper amplitude notch. Micro-frames now coarticulate with surrounding vowels (onset blends 50% previous vowel, recovery blends 65/35 back). Duration floor prevents velar smearing at high rates.
GenAm vowel tuning round 2: GOAT, THOUGHT, and FORCE vowels moved toward Hillenbrand (1995) targets. FACE /e/ cf2 raised 1900→2000. FORCE oː cf1 lowered 540→500 for more closed mouth shape. Pre-rhotic allophone rule for FORCE context.
en-gb diphthongs toward RP targets: GOAT, PRICE, and CHOICE retuned with proper RP onset and offglide qualities.
en-au LOT and STRUT loudness: Tamed via bandwidth widening — was too loud relative to surrounding vowels.
Post-nasal /d/ burst softening: Reduced pa5/pa6 and scaled frication amplitude in all English dialects. /d/ after nasals (e.g. "and", "around") no longer has an unnaturally strong burst.
Pre-rhotic cluster protection: Vowel+liquid clusters (e.g. /ɔɹl/ in "world") now get extra duration floor in rate compensation so the liquid isn't swallowed.

Platform Improvements

Android: Phoneme editor with tabbed Packs|Phonemes UI, share pack button, dropdown settings. Alpha track version 322.
iOS: Phoneme editor with live preview, share pack button, dropdown settings, voice preview on empty request.
SAPI: JAWS fragment batching, runtime cache, bookmark fix, Win7 freeze fix. All shipped as updated beta5 installers and now included in beta6 proper.
NVDA: Separate build without YY-Thunks (46% smaller DLLs).

Note on Windows 7 SAPI

Windows 7 support has improved significantly thanks to @akse0435 thorough testing on issue #45. The SAPI engine loads and runs on Win7 with both NVDA and 64-bit SAPI hosts. However, 32-bit SAPI with JAWS 18 on Windows 7 still has issues — voices don't appear in the 32-bit SAPI list and cause system slowness. JAWS 2026 on Windows 11 works well. If you're on Windows 7 with JAWS, we recommend using the NVDA addon or 64-bit SAPI for now. Investigation continues.

Note on Android

Android minimum version remains Android 8.0 (Oreo, API 26) due to Material 3 UI requirements.

tgeczy/TGSpeechBox v-300b6 TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Linux, Android, iOS, Mac OS, version 3.0 public beta 6 on GitHub