github tgeczy/TGSpeechBox v-295
TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Speech Dispatcher module version 2.95

latest releases: v-310b1, v-301, v-300...
one month ago

TGSpeechBox v2.95

A release that matures two core frontend passes, squashes a pile of phoneme editor bugs, and delivers tuning improvements across English, Polish, and Hungarian.

Did you know?

TGSpeechBox crosses 270 tunable, adjustable per-pack settings! These are flexible to be applied as flat or nested, allowing not only indentation to rule the pack. Of course that flexibility comes at a cost of potentially duplicate entries, but I'll make sure to audit packs against these. Generally, the phoneme editor and NVDA driver should write nested keys, but humans who dislike the complex dance of indents can use flat ones. Both have great points.
All 270+ settings can be localized in the NVDA driver for now. Work to make the phoneme editor have this flexibility is on the radar.

New & Changed Features

Rate Reduction → Rate Compensation

The old reduction pass only knew one trick: shorten schwas at high speed. At 3× rate, a 6ms stop burst becomes 2ms, a 45ms fricative becomes 15ms — everything squishes equally and you get muddy, unintelligible speech. Users called it "hard to tease words apart." The segments were technically alive in the token stream but acoustically dead — zombies.

v2.95 replaces this with rate compensation: a safety net that enforces minimum perceptual duration floors so no segment falls below the threshold where it can be identified. The philosophy flipped from "reduce everything uniformly" to "protect what matters at speed."

  • Per-class minimum durations in a new rateCompensation: YAML block with minimumDurations: containing configurable floors: vowelMs (25), fricativeMs (18), stopMs (4), nasalMs (18), liquidMs (15), affricateMs (12), semivowelMs (10), tapMs (4), trillMs (12), voicedConsonantMs (10). The hardcoded 18ms nasal floor that lived in calculateTimes now lives here where it belongs — configurable and per-language.
  • wordFinalBonusMs: Extra floor padding for word-final segments that carry morphological information (plural /s/, past tense /d/).
  • clusterProportionGuard and clusterMaxRatioShift: When one segment in a cluster hits its floor, neighboring segments get a proportional bump to prevent unnatural timing bulges. Without this, "texts" at high speed sounds like "te-X-ts" with a weird duration spike on whichever consonant hit its floor first.
  • floorSpeedScale: Tuning knob for how aggressively floors engage.
  • Word-final schwa reduction (a phonological rule, not a rate concern) is migrating to the allophones pass where it belongs. Rate-dependent schwa shortening gets absorbed as one of rate compensation's strategies.

The pass runs PostTiming after cluster_timing and prominence, so it sees final post-shaping durations and only intervenes when speed compression pushes things below the perceptual cliff. At normal speeds it does nothing — your tuning is untouched.

Trajectory Limit Improvements

Trajectory limiting got both value retuning and architectural improvements in this cycle.

Value changes across languages:

  • English US: cf2 raised 18→24 Hz/ms (fixes over-smoothed diphthongs)
  • Polish and Hungarian: cf2 24, cf3 22, pf2 18, pf3 22 (tuned for cluster-heavy languages)
  • The full settings block now supports nested YAML (trajectoryLimit.maxHzPerMs.cf2) and flat keys (trajectoryLimitMaxHzPerMsCf2) with bitmask-based field selection via applyTo: [cf2, cf3, pf2, pf3]

Architectural improvements planned for trajectory limit v2:

  • transF*Scale awareness: Boundary smoothing sets per-place transition speed scales (e.g. labial F2 moves slower than alveolar F2). Trajectory limit currently ignores these and recalculates from raw fadeMs. v2 will read transScale values so the effective rate becomes delta / (fadeMs × transScale) instead of delta / fadeMs.
  • Liquid/nasal separation: tokNeedsSharpTransition currently lumps liquids with nasals and exempts both from rate limiting. Nasals genuinely need sharp transitions for place perception, but liquids need the opposite — liquid dynamics creates beautiful onglide sub-segments, then trajectory limit skips them entirely. The /ɹ/→vowel F3 swing of 700+ Hz fires uncapped.
  • cf1 coverage: F1 jumps of 300+ Hz between high and low vowels sound jarring. Proposed default ~15 Hz/ms for cf1.
  • Speed-aware window scaling: At high speeds, fades are already crushed by rate compensation's floors. Rather than extending fades that fight for the same time budget, loosen Hz/ms limits slightly at high speeds.

Phoneme Editor Bug Fixes

The Spanish language pack PR #13 — the first real third-party use of the phoneme editor — exposed deep round-trip bugs in the YAML serializer. Code-level analysis of yaml_edit.cpp, yaml_min.h, and yaml_min.cpp traced nearly all the damage to two root causes: the yaml_min parser never populating keyOrder on map nodes, and the absence of a parseInlineMap function for flow-style YAML maps.

All 9 bugs identified from the Spanish language pack PR #13 audit are now fixed:

  1. Surgical save for phonemes YAML (was CRITICAL): The YAML parser now populates key insertion order (keyOrder.push_back(key) at map insertion sites in yaml_min.cpp). Before this, orderedKeys() always fell back to sortedKeys() → alphabetical. The surgical save comparison always failed → every phoneme got rewritten. Now unchanged phonemes are kept verbatim — comments, formatting, and key ordering preserved.
  2. Surgical save for language YAML (was HIGH): LanguageYaml::save() now reads the original file, finds top-level key ranges, and patches only modified sections. Comments and formatting between sections survive round-trips. Falls back to full dump only for new files.
  3. Flow map round-trip fixed (was CRITICAL): Added parseInlineMap to yaml_min.cpp. Voice profile phonemeOverrides like {cf1: 700, cf2: 1200} no longer get stored as scalar strings and wrapped in quotes. Beth and Bobby voice profiles survive editor round-trips intact.
  4. Flow sequence style preserved (was HIGH): parseInlineSeq now sets flowStyle = true. Inline arrays like [1, 1, 1, 0.92] stay inline instead of expanding to multi-line block sequences.
  5. Identity replacement filter (was LOW): The editor skips no-op rules where from == to. The no-ops in the Spanish PR came from an older editor build.
  6. setSettings key order preservation (was MODERATE): The setter saves original key ordering up to 2 levels deep before clearing, then restores it after rebuild. New keys append at the end.
  7. Scope awareness with warning dialog (was CRITICAL): When a language pack is loaded and you modify a globally-shared phoneme, the editor now scans all language pack YAMLs to show the blast radius — which packs reference the phoneme in normalization rules, which already have overrides, and how many use the global definition directly. A Yes/No/Cancel dialog offers to create a language-specific clone (e.g. ɑ_es for Spanish) with an automatic normalization rule, apply globally, or cancel. This prevents the entire category of "user edits Spanish, breaks English."
  8. Flag change validation (was HIGH): Changing critical flags (_isVowel, _isVoiced, _isStop, _isNasal, _isSemivowel, _isLiquid, _isAffricate) now triggers a dedicated warning dialog listing every flag that changed and its old → new value, with a reminder that the change affects timing, microprosody, prominence, and syllable detection globally.
  9. Formant ordering validation (was HIGH): On save, the editor validates that cf1 < cf2 < cf3 < cf4 < cf5 < cf6 and pf1 < pf2 < pf3 < pf4 < pf5 < pf6 for every phoneme. Violations are listed with the exact values and the user can choose to save anyway or fix them first.

These fixes make community contributions via the editor dramatically cleaner. Language pack PRs should now produce minimal, reviewable diffs.

/ɨ/ Vowel Formant Correction

  • cf1: 320 → 400 Hz, cf2: 1300 → 1600 Hz, cf3: 2300 → 2400 Hz.
  • Published acoustic research (Jassem, Wierzchowska, Gonet, Nimz) places Polish /ɨ/ at F1≈433, F2≈1757. Our old F2 of 1300 was 457 Hz too low, making the vowel sound like a backed central vowel instead of the near-front central quality native speakers expect. The new value of 1600 is a cross-language compromise that also improves Russian and Ukrainian /ɨ/.
  • Shoutout to Spacedog for identifying this issue through the phoneme editor — the instinct was right, and the research confirmed it.

Language & Voice Tuning

English (US)

  • Trajectory limit rebalanced: cf2 raised from 18 to 26 Hz/ms, fixing thick/drawn-out diphthongs in words like "waiting" and "way" where the F2 sweep was being over-smoothed (the "Shaggy voice" effect).
  • Pre-voiceless shortening gentler: Scale eased from 0.88 to 0.92, and the minimum floor raised from 25 to 35 ms. Vowels before voiceless stops (like the /oʊ/ in "quote") no longer get crushed to tiny stubs at fast speech rates. Rate compensation floors are now properly respected.
  • Phrase-final coda lengthening reduced: Coda scale pulled from 1.40 to 1.20 to eliminate over-lengthened fricatives at phrase boundaries (the hissy "ComboBox" effect on final /ks/).
  • Prominent vowel floor tuning: durationProminentFloorMs adjustable for better balance between vowel presence and natural rhythm.

Polish

  • Added phrase-final lengthening in nucleus-only mode, keeping heavy Polish consonant clusters like /stk/ and /ɲtɕ/ from getting stretched and smeared.
  • Added microprosody with pre-voiceless shortening disabled — Polish has no phonemic vowel length, so English-style duration cues sound foreign.
  • Added trajectory limiting and rate compensation with cluster proportion guard, protecting Polish's famously dense consonant clusters (up to 5 consonants word-initially) at fast speech rates.
  • Added question-scale shortening (0.88) matching Polish yes/no question prosody research.
  • All existing normalization rules preserved: retroflex sibilant mapping, affricate joining, nasal vowel decomposition (ɔ̃→ɔn, ɛ̃→ɛn), and vowel reduction blocking (ɐ→a).

Hungarian

  • YAML structure fixes and setting corrections from the v2.90 cycle, ensuring all Hungarian-specific prosody and microprosody settings load correctly.

Spanish

Further tuning improvements to the Spanish language pack are in progress and will land in a future release. The Spanish PR is under review with contributor guidance on using phoneme overrides correctly within the language YAML.

Portuguese

Merged #14 by @thgcode - thank you for contributing to language packs and helping PT variants with better vowel tuning. Comments on this are appreciated from other speakers of the language.

What's Next

  • Coda scale split: separate phraseFinalLengtheningCodaStopScale and phraseFinalLengtheningCodaFricativeScale for class-aware phrase-final lengthening (stops get room for the burst, fricatives stop hissing).
  • Trajectory limit v2 implementation: transScale awareness, liquid/nasal separation, cf1 coverage, speed-aware windows (see architectural notes above).
  • Rate compensation sacrifice logic: when floors force the utterance longer than the speed target, intelligently steal time from silence gaps, function word vowels, and aspiration tails rather than accepting the overshoot.
  • Phoneme editor: language-context mode that auto-routes phoneme overrides to the correct language YAML without requiring the scope warning dialog (currently the dialog catches it, but the ideal UX is to default to language-specific saves).
  • Continued cross-language vowel formant audit against published acoustic research.
  • Spanish language pack finalization.

Don't miss a new TGSpeechBox release

NewReleases is sending notifications on new releases.