TGSpeechBox v3.10 Beta 2 — reverted b101 /l/ experiment, shipping correct /ɣ/ approximant tuning
This beta replaces the withdrawn v3.10 Beta 1.1 (b101), which introduced a
Spanish /l/ anti-resonance that multiple native-speaker testers reported as
catastrophic (hola→hoda, "La le li lo lu" → "ga gue gui go gu", /l/ losing
its lateral character entirely). You were right — that approach was the
wrong tool for the job. Rolling it back and shipping the correct fix based
on a deeper read of the acoustic-phonetics literature.
What this beta changes vs b1
Reverted: the /l_es/ lateral anti-resonance from b101 (the whole
caNP + cfN0=1700 path) — back to pre-b101 values. /l/ should now sound
exactly as it did in b1, which testers confirmed was working.
Fixed: the underlying /ɣ/↔/l/ perceptual collapse from a different
angle — by treating Spanish intervocalic /ɣ/ as what it actually is
acoustically: the voiced velar APPROXIMANT [ɣ̞], not a fricative. Diego
Gomez's original #84 diagnosis pointed this out and we initially went the
wrong direction (sharpening /l/ instead of softening /ɣ/). This beta
corrects course.
The acoustic theory
The confusion in words like "entregado" stems from our /ɣ/ being too
formant-like. Per Martínez-Celdrán and Mackenzie, Spanish intervocalic /ɣ/
is phonetically [ɣ̞], an approximant with near-zero turbulent frication.
The lenition cue is intensity difference relative to flanking vowels
(Kingston 2008), not frication noise. So: lower the F2 into actual
approximant range, drop the frication to near-zero, drop voice amplitude
to carry the IntDiff cue.
Parameter changes on /ɣ_es/
cf2: 1450 → 1250 (approximant range; Fant 1960 [x]=1050)
pf2: 1400 → 1250 (match cascade)
fricationAmplitude: 0.20 → 0.08 (near-zero turbulence)
voiceAmplitude: 0.82 → 0.70 (intensity-based lenition cue)
Acoustic confirmation via LPC
A new regression test measures the F2 gap between /ɣ/ and /l/ in minimal-
pair word contexts at the 20 ms acoustic-invariance window (Blumstein &
Stevens 1979 — where listeners form categorical consonant judgments):
/ɣ/ F2 = 1161 Hz (canonical approximant range)
/l/ F2 = 1608 Hz (matches Kirkham et al. 2019 Spanish /l/ ≈1583 Hz)
ΔF2 = 446 Hz (4.5× above perceptual JND of 100 Hz)
Before b2, that same measurement showed only a 56 Hz gap — below perceptual
threshold, which is exactly why native testers couldn't distinguish the two.
Please ear-test
@gregodejesus2, @rmcpantoja, @yaresDg, @dgomez42, @29-Bloo — please test
on Windows (NVDA or SAPI), Android (APK), or Linux (tarball):
Test words:
- entregado, diálogo, lugar, agua, laguna, jugar, código (/ɣ/ examples
that were previously heard as /l/-like or too weak) - hola, lunes, alga, olvido, final, almacén (/l/ should be back to how
it sounded in b1) - siguiente (a front-vowel /ɣ/ context — we stayed above the historic
cf2=1200 /u/-coloring floor)
Listen for:
- Is /ɣ/ now audibly different from /l/ in connected speech?
- Does /l/ sound like natural Spanish /l/ again (b1-style)?
- Any "/u/-coloring" on /ɣ/ before front vowels like /i/ and /e/?
- Any other regressions vs b1?
What's NOT in this beta
Scheduled for later v3.10 beta builds:
- fricationTiltDb FrameEx field (coming in b3) — will enable dialectal /s/
brightness difference between Mexican (laminal, brighter) and Castilian
(apical, darker). Addresses #74 and #81. - endCb1/2/3 FrameEx fields (coming in b4) — per-phoneme bandwidth
evolution within a phone. Enables narrower /l/ B2 at steady state,
wider at boundaries (Stevens 1998). - Echo at slow rates (#98)
- Currency text processing (#83)
- Android Engine-tab language selector (#97)
Testing
- 44 C++ unit tests (doctest) + all Python tests pass
- New regression test locks in the /ɣ/↔/l/ F2 separation invariant
- No collateral damage to previous passing tests
If something sounds wrong
Post on #95 with the word(s) affected and what you hear vs what you'd
expect. Scientific approach is better than guessing — we can often turn
your reports into measurable regression tests.
— Tamas + Claudeo (Opus 4.7)