tgeczy/TGSpeechBox v-300rc1 on GitHub

TGSpeechBox v3.0-rc1

Release Candidate 1. This concludes the 3.0 beta cycle. No new features will be added — the next few days focus on polish and bug reports from testers. If no major issues surface, this becomes candidate for the final 3.0 release.

Changed since beta 7 (62 commits)

Dictionary IPA Injection (the big one)

The dictionary system now supports direct IPA overrides. When a pronunciation dictionary entry has a to_ipa field set, the engine bypasses eSpeak's respelling rules entirely and injects the phoneme sequence directly into the synthesis output. This solves words that English spelling conventions cannot represent — /kn/ clusters (Knievel), /hw/ onsets (Huawei), and cross-language phoneme combinations.

Space-delimited phoneme keys: to_ipa uses spaces to separate phoneme keys. Each key is looked up in the phoneme definitions, enabling pack-specific phonemes like a_es (Spanish /a/) or ɾ_wf (word-final tap) to be inserted into any language's output.
IPA fields on all platforms: New optional "From IPA" and "To IPA" text fields in the pronunciation dictionary add/edit dialogs on Android, iOS, and Win32.
"Fill IPA from eSpeak" button: One tap to auto-generate IPA from the text fields. Only fills empty IPA fields — preserves user-typed content.
"Insert phoneme" picker: Per-field buttons open a scrollable list of all phoneme keys for the current language. Selected key is appended space-delimited.
Preview with IPA: The Preview button uses the IPA override directly when set, so you hear the actual pronunciation before saving.
Shipped dict entries using IPA: Huawei (h w ˈɑː w eɪ) and Knievel (k ᵊ n ˈiː v ə l) demonstrate the system.

Exclude Categories

New "Exclude categories" option in the dictionary editor's More Options menu for Android and iOS (pronunciation type only). Checkboxes per category — uncheck to mask all entries in that category, check to re-include. Batch operation with single reload.

Rate Boost everywhere

SAPI: Rate Boost checkbox + Override Speech Rate slider in the settings app.
Linux: --rate-boost flag and TGSB_RATE_BOOST=1 environment variable. DSP time-stretch handles the excess above 2.0x synthesis cap.

Linux text normalization

New --prepare-text mode in tgsbRender applies dictionary replacements and compound splitting before eSpeak phonemization. The tgsb-speak wrapper script runs this automatically, bringing dict support to the Linux speech-dispatcher pipeline.

Spanish fixes

a_es distortion (issue #66): Widened F1/F2 bandwidths (cb1 116→140, cb2 76→90) to reduce resonant peaking at the higher formant frequencies Mateo tuned for authentic Spanish /a/.
dɾ cluster (issue #62): "Andrés" sounded like "Andés" — the /dɾ/ stop+tap cluster was missing a micro-schwa insertion rule. Added alongside the other stop+ɾ clusters.
Emoji support (issue #65): eSpeak dictionary lookup for emoji on Android.

Other fixes

Microsoft stress dict: CMU-derived entry had 1 2 1 (two primary stresses), causing word splitting and vowel normalization failure in certain contexts. Fixed to 1 0 2.
Unicode punctuation in dict lookup: Curly quotes, em/en dashes, inverted Spanish punctuation, and ellipsis are now stripped properly during dictionary word matching.
NVDA expanded punctuation handling: IPA splice now walks text and IPA arrays in parallel, skipping punctuation-only words that NVDA expands (e.g. "quote", ".").
eSpeak language tag stripping: Win32 phoneme editor strips (en), (bg) language-switch tags from IPA output, including ZWJ-padded --ipa=3 variants.
Compound dict fix: Removed bad "nowhere → now here" compound entry.
Android accessibility: Fixed phantom unlabeled toggle in dict dialogs (Compose toggleable promoted above scroll container), merged checkbox+label in Exclude Dictionaries, scrollable dialog content.
iOS accessibility: VoiceOver double-tap on dict entries opens edit directly. Preview and Edit as custom actions.

New dictionary entries (en-us)

behemoth, Chantal, chitin, dectalk, fediverse, headaches, Huawei, Knievel, monster, Mueller, nowhere (compound removed), olivine, pentagon, Popeyes, Proulx, resurrection, Tucson, tulip, werewolf, winget.

What's in TGSpeechBox 3.0?

For those who haven't been following the beta cycle, here's what 3.0 brings compared to 2.x.

New Features

Pronunciation dictionary system: Full dictionary editor on Android, iOS, macOS, and Windows. Four dictionary types: pronunciation (text-to-text respelling + IPA overrides), stress (CMU-style patterns), compound (word splitting for correct stress), and character (letter-name overrides). Import/export TSV, search, per-language browsing, category management, exclude categories and dictionary types.
IPA injection: Dictionary entries can specify exact phoneme sequences that bypass eSpeak entirely, enabling pronunciations that spelling rules cannot produce.
Mobile platforms: Full-featured Android and iOS apps with TalkBack and VoiceOver accessibility. TTS service for system-wide use, standalone Speak tab, phoneme editor, dictionary editor, voice profiles, and engine settings.
SAPI engine: Windows SAPI5 TTS voice with settings app — sample rate, voicing tone, voice quality, pitch mode, rate boost, override speech rate. Single statically-linked DLL per architecture.
Rate Boost: Doubles speech rate using DSP time-stretch on all platforms (NVDA, SAPI, Android, iOS, Linux). Synthesis capped at 2.0x, frame-advance handles the rest.
Override Speech Rate: Manual rate slider (0.3x–4.0x) that overrides the host/screen reader rate. Available on Android, iOS, and SAPI.
Voice profiles: Beth (female), Bobby (child), and custom profiles defined in phonemes.yaml. Available on all platforms.
Per-voice engine settings: Each voice profile remembers its own tuning independently.
Fujisaki pitch model: Phrase-level and accent-level pitch contours with clause-type overrides (question rise, exclamation boost, comma continuation). Declination, final rise/drop, and base pitch scaling.
26 languages: English (US, GB, AU, CA), German, French, Spanish (ES, MX), Italian, Portuguese (PT, BR), Dutch, Polish, Russian, Ukrainian, Czech, Slovak, Hungarian, Romanian, Croatian, Bulgarian, Swedish, Danish, Finnish, Turkish, Chinese.
Head Size slider: Vocal tract length control with graduated formant scaling.
Year splitting and date ordinals: "1995" → "nineteen ninety-five", "June 6" → "June 6th".
Lock Language: Screen reader language requests can be overridden to always use the Speak tab's selected language.
Linux speech-dispatcher integration: Generic module config with clause-aware wrapper script, rate boost, and text normalization.

Sound Quality

Diphthong collapse: Connected-speech diphthong system with formant interpolation, bandwidth sweeping, onset settle time, and rate-adaptive crossfades.
Boundary smoothing: Hybrid approach — stretch fade + aspiration bypass + formant lead. Voiceless fricative onset protection at high rates.
Soft-knee limiter: Replaces hard-knee limiter. Smooth gain reduction without audible pumping.
Sonorant-context vowel protection: Extra duration and amplitude for unstressed vowels between nasals, liquids, and semivowels.
Subglottal coupling: Pole-zero pair at 630/590 Hz gated by glottal phase for more natural voice quality.
Stop burst spectral shaping: Reduced affricate confusion, word-final burst rolloff, velar boost rules.
Spanish tuning: Mateo's authentic vowel formants, diphthong offglides, trill phoneme, dental approximant, and 20+ allophone rules.
Hungarian geminate consonants: Proper stop closure and fricative duration for geminates.

Architecture

ABI v5: Generic data query API for phonemes, dictionaries, and settings.
Pass pipeline: Modular acoustic processing — coarticulation, allophones, boundary smoothing, Fujisaki pitch, prominence, cluster timing, rate compensation, syllable marking, diphthong collapse, frame emission.
Frame emission unified: Single template replaces duplicated acoustic math.
DSP time-stretch: Frame-advance rate boosting at the speechPlayer level.
GitHub Actions CI: Automated Linux builds for x86_64 and aarch64.
Three static libs: espeak_phonemizer (GPL3), tgspeechbox_static (MIT DSP), tgsbFrontend_static (MIT frontend).

Join the test!

Join TestFlight (iOS/macOS)
Join the Android alpha (web)
Join the Android alpha (phone)

tgeczy/TGSpeechBox v-300rc1 TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Linux, Android, iOS, Mac OS, version 3.0 Release Candidate 1 on GitHub