NV Speech Player Version 2.65 Release Notes

This is a major release introducing the Fujisaki prosodic pitch model, a complete coarticulation overhaul, and numerous phoneme and language improvements. Breathiness, when set to 100, actually gets breathy now.

Highlights

Fujisaki Pitch Model - Natural prosodic contours with per-language tuning
MITalk Coarticulation Model - Smoother consonant-to-vowel transitions
EndCF/EndPF Ramping - Dynamic formant ramping for natural coarticulation
Hungarian Palatal Stops - Properly tuned gy (ɟ) and ty (c) with off-glide support
Velar Pinch - Correct k/g formant patterns distinguishing words like "script" vs "stripped"

New Features

Fujisaki Pitch Model

A complete prosodic pitch contour system has been implemented, replacing the basic pitch interpolation with a linguistically-motivated model:

Phrase commands for utterance-level declination patterns
Accent commands for stressed syllable prominence
Per-language tuning via pitchMode in language pack YAML
IIR filter-based implementation for smooth exponential pitch contours
Configurable parameters: phrase amplitude (Ap), accent amplitude (Aa), and time constants

This enables natural-sounding prosody that can be tuned from eSpeak's characteristic sing-songy style to flatter Eloquence-like delivery.

Special thanks to Rommix for the IIR filter implementation idea and extensive work on the Fujisaki code. This pitch model wouldn't exist in SpeechPlayer without his contributions!

MITalk Coarticulation Model

Replaced the old coarticulation approach with the MITalk 0.42 locus rule:

Vowel START formants shift toward preceding consonant locus values
Smooth ramping to canonical vowel targets within vowel duration
Graduated coarticulation strength based on consonant distance
Configurable F2 locus values per place of articulation:
- Labial: 800 Hz
- Alveolar: 1800 Hz
- Velar: 2200 Hz (with velar pinch for front vowels)

EndCF1-3 and EndPF1-3 Ramping

New formant endpoint ramping for natural coarticulation:

EndCF1, EndCF2, EndCF3 - Target center frequencies for formant ramping
EndPF1, EndPF2, EndPF3 - Target peak frequencies
Dynamic Q-capping prevents resonance artifacts during formant sweeps
Exponential smoothing with ~10-15ms time constant

Stop Closure Improvements

New approach to stop consonant closures with improved burst characteristics and velar pinch implementation for accurate k/g distinction.

Language-Specific Fixes

Hungarian

Palatal stops fixed: gy (ɟ) and ty (c) now have proper formants
- ɟ: fricAmp=0.28, F2=2200 Hz with tuned bandwidth
- c: fricAmp=0.92 for strong voiceless burst
j off-glide rules in hu.yaml: palatals before vowels get reinforcing glide (kutya → kucja, gyerek → ɟjerek)
Vowel tuning:
- é: cb1=110, cb2=115, voiceAmp=0.82 (less piercing)
- á: F1=680 Hz, F2=1670 Hz (based on research data)
- ᴒ (short a): cb1=190, cb2=140, cf1=680, voiceAmp=0.80
- ö/ő/ø: F2 raised 1400→1550 Hz for smoother palatal transitions (György)
Nasal distinction: n (F2=1550 Hz) vs ɲ (F2=2500 Hz) now clearly separated

English

Velar pinch implemented: k/g now use F2=1800 Hz, F3=2400 Hz (600 Hz gap)
Fixes confusion between words like "script" and "stripped"
Adjusted burst spectrum for compact mid-frequency character
Fixed /g/ intervocalic burst balance

British English

PALM/START vowel (ɑ) fixed - no longer sounds Australian
New UK-specific phoneme (ᵅ) with proper back vowel F2 ~1200 Hz

Deprecated Parameters

Deprecated Parameters
The following parameters are still parsed for backward compatibility but are no longer functional in the new coarticulation system. They can be safely removed from language pack YAML files:
• coarticulationTransitionExtent - replaced by endCf1-3 ramping within vowel duration
• coarticulationFadeIntoConsonants - no longer used; new model modifies vowel START formants instead
• coarticulationWordInitialFadeScale - no longer used
These parameters will continue to load without error but have no effect on synthesis.

Files Changed

speechWaveGenerator.cpp/.h - Fujisaki pitch implementation, EndCF/EndPF ramping
ipa_engine.cpp - MITalk coarticulation, pitch mode selection
pack.h/pack.cpp - Fujisaki and coarticulation parameters
phonemes.yaml - Velar stops, Hungarian palatals and vowels
hu.yaml - Palatal off-glide rules
en-gb.yaml - UK vowel routing

Acknowledgments

Rommix - Fujisaki model implementation, IIR filter design, and extensive pitch contour tuning. The natural prosody in this release is thanks to his work! As well as helping with coarticulation ideas, and what's the better approach.

Upgrading

This release maintains backward compatibility with existing language packs. New features are opt-in via YAML configuration. The Fujisaki pitch model can be enabled per-language with:

pitchMode: "fujisaki"

tgeczy/TGSpeechBox v-265 NVSpeech Player with phoneme editor and NVDA Addon, Speech Dispatcher module version 2.65 on GitHub