TGSpeechBox v3.10 Beta 3 — fast-rate /ɡ/ intelligibility (DSP v9 FricationTilt)
Ships the Bug 1 fix from #95 two-bug analysis: a rate-adaptive frication
spectral tilt that prevents /ɡ/ bursts from drifting into alveolar-tap
territory at fast speech rates. 29-Bloo's "Pegue → Pere" report on #95
was the diagnostic gift that isolated this; Tomi's ear-testing across
~100 rendered variants confirmed the precise spectral region and
magnitude that produces the fix.
The fix in one paragraph
At speeds > 1.2x normal, the /ɡ_es/ burst's high-frequency parallel
amplitudes (pa5 at 3750 Hz, pa6 at 4900 Hz) stay proportionally prominent
while the burst body time-compresses. That creates an upward spectral
tilt native ears read as tap [ɾ] or [d]. A new fricationTiltDb FrameEx
field scales pa4/pa5/pa6 by a frequency-dependent amount (pivot 1500 Hz,
asymmetric rolloff above pivot only — preserves pa1/pa2/pa3 entirely).
The ipa_engine applies rate-modulated negative tilt during stop burst
emission: 0 dB at speed ≤1.2, -1 dB at 1.3, -5 dB at 1.7, -8 dB at 2.0.
Preserves the velar F3 signature that makes /ɡ/ sound like /ɡ/ — only
removes the high-frequency click that was causing the tap misperception.
Backed by
- Kingston et al. 2008 (Journal of Phonetics) — perceptual integration
of low-frequency energy across stop boundaries - Smits et al. 1996 (JASA) — burst dominates place-of-articulation
perception in front-vowel contexts (why "pegue" was most vulnerable) - Hualde et al. 2011 (Laboratory Phonology) — velars intrinsically less
differentiable than labials/coronals in Spanish
Changes in this beta
DSP engine (v9)
- New FrameEx field
fricationTiltDb(replaces unusedcaN0field
from v9's initial addition). Net-zero struct size change vs b201. - Consumer in
formantGenerator.hparallel path: scales each pa_i by
10^(tiltDb * max(0, pf_i - 1500Hz) / (20 * 3000Hz)). When tiltDb=0
(the legacy/normal-speed case), fast-path returns 1.0 without pow(). - Rate modulator in
frame_emit.cppburst emission: applies tilt only
during burst + decay frames, restores to 0 before next phoneme.
Per-phoneme knobs
fricationTiltDb and closureGapMs (b2's closure decoupling override)
are now exposed in all three phoneme editors:
- Win32 phoneme editor: both fields available in the modification
dialog with descriptive labels. - Android phoneme editor: both fields in the phoneme-field list
with proper slider ranges (tilt: -15..+15 dB, closure: 0..60 ms). - iOS phoneme editor: same.
Testers who want to tune these on a per-phoneme basis (e.g. experiment
with different fricationTiltDb baselines on /s/, /ʃ/) can now do so
directly from the editor on their platform of choice.
Cleanup
- Removed unused
caN0FrameEx field (DSP v9 infrastructure from
v3.10b1.1 that was never consumed by any shipped phoneme, and had a
documented FIR HF-boost gotcha). The /l_es/ lateral antiresonance
work used the existing caNP path instead. This is the cleanest moment
to prune — swapped in-place withfricationTiltDb, no user-visible
behavior change since no phoneme ever set caN0 to non-zero.
What this beta does NOT address
- Bug 2 (cluster /l/+/ɣ/ tap percept): improved from b201, not
fully eliminated. Different root cause (schwa+closure temporal
template mimicking tap articulation). Not scoped for b3; may
improve incidentally from the tilt work, will revisit in b4 if
still perceptible after community testing. - dialogo /o-ɡ-o/ at normal rate: inherent phonetic difficulty
per Kingston 2008 (low-frequency energy integration across stop
boundaries with low-F1 vowels flanking). Will likely improve at
fast rates from this fix; normal-rate remains hard. - Other items on the 3.10 roadmap (gender-aware number expansion #90,
currency #83, Android engine-tab #97, emoji translations #96) are
not in this beta.
Please ear-test
Native Spanish speakers, especially @gregodejesus2, @29-Bloo,
@rmcpantoja, @yaresDg, @dgomez42 — test on Windows (NVDA or SAPI),
Android (APK), or Linux (tarball when CI completes).
Listen especially at fast speech rates:
- Pegue, fuego, negar — /e-ɡ-e/ and /e-ɡ-a/ contexts that were
collapsing into tap-like sounds - Lago, pagar, amigo — baseline intervocalic /ɣ/ that should
stay stable or improve slightly - Algo, salga, algunas — cluster contexts (Bug 2); observe if
incidental improvement happens, even though not directly targeted
At normal speeds everything should sound the same or better than b201
— there should be zero regression at speed ≤1.2.
Per-phoneme tuning invitation
If you want to experiment with different fricationTiltDb values on a
specific phoneme, you can now do so directly in the phoneme editor
(Win32/Android/iOS). Share your findings on #95 — we'd genuinely
value your tuning intuitions becoming pack contributions.
Testing
- All C++ unit tests (doctest) pass
- Zero compile warnings in all three build configurations (MinSizeRel)
- Rate-adaptive tilt renders confirmed byte-different from legacy at
speed > 1.2, byte-identical at speed ≤1.2 (no regression at
normal/slow rates)
If something sounds wrong
Post on #95 with the affected word(s) and the speech rate you were
using. We'll tune and re-release. Hobby pace as always.
— Tamas + Claudeo (Opus 4.7)
Links
- Android (Google Play)
- iOS / macOS (App Store)
- GitHub
- License: MIT (sd_tgsb module: GPL-3.0)
Join the Test
Want to help test before the full v3.10 release?