github tgeczy/TGSpeechBox v-310b3
TG SpeechBox with phoneme editor, NVDA Addon, SAPI5, Linux, Android, iOS, Mac OS, version 310 beta 3

pre-release4 hours ago

TGSpeechBox v3.10 Beta 3 — fast-rate /ɡ/ intelligibility (DSP v9 FricationTilt)

Ships the Bug 1 fix from #95 two-bug analysis: a rate-adaptive frication
spectral tilt that prevents /ɡ/ bursts from drifting into alveolar-tap
territory at fast speech rates. 29-Bloo's "Pegue → Pere" report on #95
was the diagnostic gift that isolated this; Tomi's ear-testing across
~100 rendered variants confirmed the precise spectral region and
magnitude that produces the fix.

The fix in one paragraph

At speeds > 1.2x normal, the /ɡ_es/ burst's high-frequency parallel
amplitudes (pa5 at 3750 Hz, pa6 at 4900 Hz) stay proportionally prominent
while the burst body time-compresses. That creates an upward spectral
tilt native ears read as tap [ɾ] or [d]. A new fricationTiltDb FrameEx
field scales pa4/pa5/pa6 by a frequency-dependent amount (pivot 1500 Hz,
asymmetric rolloff above pivot only — preserves pa1/pa2/pa3 entirely).
The ipa_engine applies rate-modulated negative tilt during stop burst
emission: 0 dB at speed ≤1.2, -1 dB at 1.3, -5 dB at 1.7, -8 dB at 2.0.
Preserves the velar F3 signature that makes /ɡ/ sound like /ɡ/ — only
removes the high-frequency click that was causing the tap misperception.

Backed by

  • Kingston et al. 2008 (Journal of Phonetics) — perceptual integration
    of low-frequency energy across stop boundaries
  • Smits et al. 1996 (JASA) — burst dominates place-of-articulation
    perception in front-vowel contexts (why "pegue" was most vulnerable)
  • Hualde et al. 2011 (Laboratory Phonology) — velars intrinsically less
    differentiable than labials/coronals in Spanish

Changes in this beta

DSP engine (v9)

  • New FrameEx field fricationTiltDb (replaces unused caN0 field
    from v9's initial addition). Net-zero struct size change vs b201.
  • Consumer in formantGenerator.h parallel path: scales each pa_i by
    10^(tiltDb * max(0, pf_i - 1500Hz) / (20 * 3000Hz)). When tiltDb=0
    (the legacy/normal-speed case), fast-path returns 1.0 without pow().
  • Rate modulator in frame_emit.cpp burst emission: applies tilt only
    during burst + decay frames, restores to 0 before next phoneme.

Per-phoneme knobs

fricationTiltDb and closureGapMs (b2's closure decoupling override)
are now exposed in all three phoneme editors:

  • Win32 phoneme editor: both fields available in the modification
    dialog with descriptive labels.
  • Android phoneme editor: both fields in the phoneme-field list
    with proper slider ranges (tilt: -15..+15 dB, closure: 0..60 ms).
  • iOS phoneme editor: same.

Testers who want to tune these on a per-phoneme basis (e.g. experiment
with different fricationTiltDb baselines on /s/, /ʃ/) can now do so
directly from the editor on their platform of choice.

Cleanup

  • Removed unused caN0 FrameEx field (DSP v9 infrastructure from
    v3.10b1.1 that was never consumed by any shipped phoneme, and had a
    documented FIR HF-boost gotcha). The /l_es/ lateral antiresonance
    work used the existing caNP path instead. This is the cleanest moment
    to prune — swapped in-place with fricationTiltDb, no user-visible
    behavior change since no phoneme ever set caN0 to non-zero.

What this beta does NOT address

  • Bug 2 (cluster /l/+/ɣ/ tap percept): improved from b201, not
    fully eliminated. Different root cause (schwa+closure temporal
    template mimicking tap articulation). Not scoped for b3; may
    improve incidentally from the tilt work, will revisit in b4 if
    still perceptible after community testing.
  • dialogo /o-ɡ-o/ at normal rate: inherent phonetic difficulty
    per Kingston 2008 (low-frequency energy integration across stop
    boundaries with low-F1 vowels flanking). Will likely improve at
    fast rates from this fix; normal-rate remains hard.
  • Other items on the 3.10 roadmap (gender-aware number expansion #90,
    currency #83, Android engine-tab #97, emoji translations #96) are
    not in this beta.

Please ear-test

Native Spanish speakers, especially @gregodejesus2, @29-Bloo,
@rmcpantoja, @yaresDg, @dgomez42 — test on Windows (NVDA or SAPI),
Android (APK), or Linux (tarball when CI completes).

Listen especially at fast speech rates:

  • Pegue, fuego, negar — /e-ɡ-e/ and /e-ɡ-a/ contexts that were
    collapsing into tap-like sounds
  • Lago, pagar, amigo — baseline intervocalic /ɣ/ that should
    stay stable or improve slightly
  • Algo, salga, algunas — cluster contexts (Bug 2); observe if
    incidental improvement happens, even though not directly targeted

At normal speeds everything should sound the same or better than b201
— there should be zero regression at speed ≤1.2.

Per-phoneme tuning invitation

If you want to experiment with different fricationTiltDb values on a
specific phoneme, you can now do so directly in the phoneme editor
(Win32/Android/iOS). Share your findings on #95 — we'd genuinely
value your tuning intuitions becoming pack contributions.

Testing

  • All C++ unit tests (doctest) pass
  • Zero compile warnings in all three build configurations (MinSizeRel)
  • Rate-adaptive tilt renders confirmed byte-different from legacy at
    speed > 1.2, byte-identical at speed ≤1.2 (no regression at
    normal/slow rates)

If something sounds wrong

Post on #95 with the affected word(s) and the speech rate you were
using. We'll tune and re-release. Hobby pace as always.

— Tamas + Claudeo (Opus 4.7)

Links

Join the Test

Want to help test before the full v3.10 release?

Don't miss a new TGSpeechBox release

NewReleases is sending notifications on new releases.