NV Speech Player Version 2.65 Release Notes
This is a major release introducing the Fujisaki prosodic pitch model, a complete coarticulation overhaul, and numerous phoneme and language improvements. Breathiness, when set to 100, actually gets breathy now.
Highlights
- Fujisaki Pitch Model - Natural prosodic contours with per-language tuning
- MITalk Coarticulation Model - Smoother consonant-to-vowel transitions
- EndCF/EndPF Ramping - Dynamic formant ramping for natural coarticulation
- Hungarian Palatal Stops - Properly tuned gy (ɟ) and ty (c) with off-glide support
- Velar Pinch - Correct k/g formant patterns distinguishing words like "script" vs "stripped"
New Features
Fujisaki Pitch Model
A complete prosodic pitch contour system has been implemented, replacing the basic pitch interpolation with a linguistically-motivated model:
- Phrase commands for utterance-level declination patterns
- Accent commands for stressed syllable prominence
- Per-language tuning via pitchMode in language pack YAML
- IIR filter-based implementation for smooth exponential pitch contours
- Configurable parameters: phrase amplitude (Ap), accent amplitude (Aa), and time constants
This enables natural-sounding prosody that can be tuned from eSpeak's characteristic sing-songy style to flatter Eloquence-like delivery.
Special thanks to Rommix for the IIR filter implementation idea and extensive work on the Fujisaki code. This pitch model wouldn't exist in SpeechPlayer without his contributions!
MITalk Coarticulation Model
Replaced the old coarticulation approach with the MITalk 0.42 locus rule:
- Vowel START formants shift toward preceding consonant locus values
- Smooth ramping to canonical vowel targets within vowel duration
- Graduated coarticulation strength based on consonant distance
- Configurable F2 locus values per place of articulation:
- Labial: 800 Hz
- Alveolar: 1800 Hz
- Velar: 2200 Hz (with velar pinch for front vowels)
EndCF1-3 and EndPF1-3 Ramping
New formant endpoint ramping for natural coarticulation:
- EndCF1, EndCF2, EndCF3 - Target center frequencies for formant ramping
- EndPF1, EndPF2, EndPF3 - Target peak frequencies
- Dynamic Q-capping prevents resonance artifacts during formant sweeps
- Exponential smoothing with ~10-15ms time constant
Stop Closure Improvements
New approach to stop consonant closures with improved burst characteristics and velar pinch implementation for accurate k/g distinction.
Language-Specific Fixes
Hungarian
- Palatal stops fixed: gy (ɟ) and ty (c) now have proper formants
- ɟ: fricAmp=0.28, F2=2200 Hz with tuned bandwidth
- c: fricAmp=0.92 for strong voiceless burst
- j off-glide rules in hu.yaml: palatals before vowels get reinforcing glide (kutya → kucja, gyerek → ɟjerek)
- Vowel tuning:
- é: cb1=110, cb2=115, voiceAmp=0.82 (less piercing)
- á: F1=680 Hz, F2=1670 Hz (based on research data)
- ᴒ (short a): cb1=190, cb2=140, cf1=680, voiceAmp=0.80
- ö/ő/ø: F2 raised 1400→1550 Hz for smoother palatal transitions (György)
- Nasal distinction: n (F2=1550 Hz) vs ɲ (F2=2500 Hz) now clearly separated
English
- Velar pinch implemented: k/g now use F2=1800 Hz, F3=2400 Hz (600 Hz gap)
- Fixes confusion between words like "script" and "stripped"
- Adjusted burst spectrum for compact mid-frequency character
- Fixed /g/ intervocalic burst balance
British English
- PALM/START vowel (ɑ) fixed - no longer sounds Australian
- New UK-specific phoneme (ᵅ) with proper back vowel F2 ~1200 Hz
Deprecated Parameters
Deprecated Parameters
The following parameters are still parsed for backward compatibility but are no longer functional in the new coarticulation system. They can be safely removed from language pack YAML files:
• coarticulationTransitionExtent - replaced by endCf1-3 ramping within vowel duration
• coarticulationFadeIntoConsonants - no longer used; new model modifies vowel START formants instead
• coarticulationWordInitialFadeScale - no longer used
These parameters will continue to load without error but have no effect on synthesis.
Files Changed
speechWaveGenerator.cpp/.h- Fujisaki pitch implementation, EndCF/EndPF rampingipa_engine.cpp- MITalk coarticulation, pitch mode selectionpack.h/pack.cpp- Fujisaki and coarticulation parametersphonemes.yaml- Velar stops, Hungarian palatals and vowelshu.yaml- Palatal off-glide rulesen-gb.yaml- UK vowel routing
Acknowledgments
- Rommix - Fujisaki model implementation, IIR filter design, and extensive pitch contour tuning. The natural prosody in this release is thanks to his work! As well as helping with coarticulation ideas, and what's the better approach.
Upgrading
This release maintains backward compatibility with existing language packs. New features are opt-in via YAML configuration. The Fujisaki pitch model can be enabled per-language with:
pitchMode: "fujisaki"