NV Speech Player 2.01 Release Notes
This release focuses on stability, correctness, and cross-language improvements. The primary goal was to address accumulated issues in the synthesis pipeline and bring several under-maintained language packs up to a proper standard.
Synthesis Engine Fixes
Boundary Smoothing Pass
The boundary smoothing pass was discovered to have no audible effect due to duplicate code in ipa_engine.cpp that applied smoothing before the pass could run. This has been corrected:
- Removed duplicate smoothing logic from
ipa_engine.cpp - Increased fade values to 18-22ms (previously at or below the 10ms default threshold)
- Added 10 new transition types beyond the original 3, including fricative-vowel, vowel-nasal, nasal-vowel, liquid transitions, and consonant cluster handling
- Fade duration is now capped at 50% of token duration to prevent artifacts on short segments
Trajectory Limiting
Fixed an issue where the trajectory limiting code was incorrectly clamping formants on semivowels and liquids. The word "thirty" was producing an American R sound due to F2 trajectories passing through ~1300 Hz when transitioning from /w/. Both trajectory_limit.cpp and ipa_engine.cpp now skip clamping for semivowels and liquids.
Length Contrast (Geminates)
Corrected geminate consonant handling in length_contrast.cpp. The pass now properly identifies geminates in all positions and applies closure scaling consistently.
Phoneme Tuning
M (Bilabial Nasal)
Adjusted formant values based on acoustic phonetics research. The antiformant (cfN0) is now set to 950 Hz to properly cancel F2 energy, eliminating the overly "stuffy" quality in words like "same" and "memory".
P (Voiceless Bilabial Stop)
Reduced frication amplitude and adjusted formant values for a flatter, more diffuse spectrum. The F2 locus is now at 850 Hz, appropriate for bilabial consonants. This addresses the excessive "puffiness" reported in words like "quote" and "paper".
K (Voiceless Velar Stop)
Reduced frication amplitude and parallel formant values to soften the release burst.
Language Pack Improvements
Danish
Resolved a significant intelligibility issue. eSpeak-ng outputs glottal stops for Danish stod, which were causing vowels to be interrupted and producing speech that was difficult to understand. All glottal stop variants (including the ? mnemonic) are now stripped in preprocessing.
Swedish
Added proper quantity sensitivity settings. Swedish requires stressed syllables to be heavy (long vowel or short vowel plus geminate consonant). Configured geminate closure scaling and pre-geminate vowel shortening. Added mappings for retroflex consonants from r+dental clusters and the sj-sound.
Dutch
Added tense/lax vowel contrast settings and configured the labiodental approximant mapping. Vowel length values are conservative to avoid over-elongation.
German
Enabled glottal reinforcement for vowel-initial stressed syllables (Knacklaut). Configured uvular R normalization and appropriate stress settings.
Polish
Minimal changes to preserve the existing retroflex sibilant behavior. Added single-word tuning and boundary smoothing for screen reader navigation.
Russian, Ukrainian, Finnish, Bulgarian
Applied learnings from Hungarian geminate work. Finnish now uses a 2.2x geminate closure scale reflecting the language's strong length contrasts. Ukrainian adds geminate support for consonants resulting from assimilation. Russian and Bulgarian receive stronger stress settings appropriate for their vowel reduction patterns.
Build System
Added voice_profile.cpp to Makefile.linux which was previously missing, causing build failures on Linux.
Known Issues
Danish stod is currently stripped entirely rather than rendered as creaky voice. A proper implementation would require synthesizer-level support for laryngealization. We are exploring this for the future.
NV Speech Player 2.01