NV Speech Player 2.01 Release Notes

This release focuses on stability, correctness, and cross-language improvements. The primary goal was to address accumulated issues in the synthesis pipeline and bring several under-maintained language packs up to a proper standard.

Synthesis Engine Fixes

Boundary Smoothing Pass

The boundary smoothing pass was discovered to have no audible effect due to duplicate code in ipa_engine.cpp that applied smoothing before the pass could run. This has been corrected:

Removed duplicate smoothing logic from ipa_engine.cpp
Increased fade values to 18-22ms (previously at or below the 10ms default threshold)
Added 10 new transition types beyond the original 3, including fricative-vowel, vowel-nasal, nasal-vowel, liquid transitions, and consonant cluster handling
Fade duration is now capped at 50% of token duration to prevent artifacts on short segments

Trajectory Limiting

Fixed an issue where the trajectory limiting code was incorrectly clamping formants on semivowels and liquids. The word "thirty" was producing an American R sound due to F2 trajectories passing through ~1300 Hz when transitioning from /w/. Both trajectory_limit.cpp and ipa_engine.cpp now skip clamping for semivowels and liquids.

Length Contrast (Geminates)

Corrected geminate consonant handling in length_contrast.cpp. The pass now properly identifies geminates in all positions and applies closure scaling consistently.

Phoneme Tuning

M (Bilabial Nasal)

Adjusted formant values based on acoustic phonetics research. The antiformant (cfN0) is now set to 950 Hz to properly cancel F2 energy, eliminating the overly "stuffy" quality in words like "same" and "memory".

P (Voiceless Bilabial Stop)

Reduced frication amplitude and adjusted formant values for a flatter, more diffuse spectrum. The F2 locus is now at 850 Hz, appropriate for bilabial consonants. This addresses the excessive "puffiness" reported in words like "quote" and "paper".

K (Voiceless Velar Stop)

Reduced frication amplitude and parallel formant values to soften the release burst.

Language Pack Improvements

Danish

Resolved a significant intelligibility issue. eSpeak-ng outputs glottal stops for Danish stod, which were causing vowels to be interrupted and producing speech that was difficult to understand. All glottal stop variants (including the ? mnemonic) are now stripped in preprocessing.

Swedish

Added proper quantity sensitivity settings. Swedish requires stressed syllables to be heavy (long vowel or short vowel plus geminate consonant). Configured geminate closure scaling and pre-geminate vowel shortening. Added mappings for retroflex consonants from r+dental clusters and the sj-sound.

Dutch

Added tense/lax vowel contrast settings and configured the labiodental approximant mapping. Vowel length values are conservative to avoid over-elongation.

German

Enabled glottal reinforcement for vowel-initial stressed syllables (Knacklaut). Configured uvular R normalization and appropriate stress settings.

Polish

Minimal changes to preserve the existing retroflex sibilant behavior. Added single-word tuning and boundary smoothing for screen reader navigation.

Russian, Ukrainian, Finnish, Bulgarian

Applied learnings from Hungarian geminate work. Finnish now uses a 2.2x geminate closure scale reflecting the language's strong length contrasts. Ukrainian adds geminate support for consonants resulting from assimilation. Russian and Bulgarian receive stronger stress settings appropriate for their vowel reduction patterns.

Build System

Added voice_profile.cpp to Makefile.linux which was previously missing, causing build failures on Linux.

Known Issues

Danish stod is currently stripped entirely rather than rendered as creaky voice. A proper implementation would require synthesizer-level support for laryngealization. We are exploring this for the future.

NV Speech Player 2.01

tgeczy/TGSpeechBox v-201 NVSpeech Player with phoneme editor and NVDA Addon, Speech Dispatcher module version 2.01 on GitHub