github tgeczy/TGSpeechBox v-265
NVSpeech Player with phoneme editor and NVDA Addon, Speech Dispatcher module version 2.65

latest releases: v-301, v-300, v-300rc2...
one month ago

NV Speech Player Version 2.65 Release Notes

This is a major release introducing the Fujisaki prosodic pitch model, a complete coarticulation overhaul, and numerous phoneme and language improvements. Breathiness, when set to 100, actually gets breathy now.

Highlights

  • Fujisaki Pitch Model - Natural prosodic contours with per-language tuning
  • MITalk Coarticulation Model - Smoother consonant-to-vowel transitions
  • EndCF/EndPF Ramping - Dynamic formant ramping for natural coarticulation
  • Hungarian Palatal Stops - Properly tuned gy (ɟ) and ty (c) with off-glide support
  • Velar Pinch - Correct k/g formant patterns distinguishing words like "script" vs "stripped"

New Features

Fujisaki Pitch Model

A complete prosodic pitch contour system has been implemented, replacing the basic pitch interpolation with a linguistically-motivated model:

  • Phrase commands for utterance-level declination patterns
  • Accent commands for stressed syllable prominence
  • Per-language tuning via pitchMode in language pack YAML
  • IIR filter-based implementation for smooth exponential pitch contours
  • Configurable parameters: phrase amplitude (Ap), accent amplitude (Aa), and time constants

This enables natural-sounding prosody that can be tuned from eSpeak's characteristic sing-songy style to flatter Eloquence-like delivery.

Special thanks to Rommix for the IIR filter implementation idea and extensive work on the Fujisaki code. This pitch model wouldn't exist in SpeechPlayer without his contributions!

MITalk Coarticulation Model

Replaced the old coarticulation approach with the MITalk 0.42 locus rule:

  • Vowel START formants shift toward preceding consonant locus values
  • Smooth ramping to canonical vowel targets within vowel duration
  • Graduated coarticulation strength based on consonant distance
  • Configurable F2 locus values per place of articulation:
    • Labial: 800 Hz
    • Alveolar: 1800 Hz
    • Velar: 2200 Hz (with velar pinch for front vowels)

EndCF1-3 and EndPF1-3 Ramping

New formant endpoint ramping for natural coarticulation:

  • EndCF1, EndCF2, EndCF3 - Target center frequencies for formant ramping
  • EndPF1, EndPF2, EndPF3 - Target peak frequencies
  • Dynamic Q-capping prevents resonance artifacts during formant sweeps
  • Exponential smoothing with ~10-15ms time constant

Stop Closure Improvements

New approach to stop consonant closures with improved burst characteristics and velar pinch implementation for accurate k/g distinction.

Language-Specific Fixes

Hungarian

  • Palatal stops fixed: gy (ɟ) and ty (c) now have proper formants
    • ɟ: fricAmp=0.28, F2=2200 Hz with tuned bandwidth
    • c: fricAmp=0.92 for strong voiceless burst
  • j off-glide rules in hu.yaml: palatals before vowels get reinforcing glide (kutya → kucja, gyerek → ɟjerek)
  • Vowel tuning:
    • é: cb1=110, cb2=115, voiceAmp=0.82 (less piercing)
    • á: F1=680 Hz, F2=1670 Hz (based on research data)
    • ᴒ (short a): cb1=190, cb2=140, cf1=680, voiceAmp=0.80
    • ö/ő/ø: F2 raised 1400→1550 Hz for smoother palatal transitions (György)
  • Nasal distinction: n (F2=1550 Hz) vs ɲ (F2=2500 Hz) now clearly separated

English

  • Velar pinch implemented: k/g now use F2=1800 Hz, F3=2400 Hz (600 Hz gap)
  • Fixes confusion between words like "script" and "stripped"
  • Adjusted burst spectrum for compact mid-frequency character
  • Fixed /g/ intervocalic burst balance

British English

  • PALM/START vowel (ɑ) fixed - no longer sounds Australian
  • New UK-specific phoneme (ᵅ) with proper back vowel F2 ~1200 Hz

Deprecated Parameters

Deprecated Parameters
The following parameters are still parsed for backward compatibility but are no longer functional in the new coarticulation system. They can be safely removed from language pack YAML files:
• coarticulationTransitionExtent - replaced by endCf1-3 ramping within vowel duration
• coarticulationFadeIntoConsonants - no longer used; new model modifies vowel START formants instead
• coarticulationWordInitialFadeScale - no longer used
These parameters will continue to load without error but have no effect on synthesis.

Files Changed

  • speechWaveGenerator.cpp/.h - Fujisaki pitch implementation, EndCF/EndPF ramping
  • ipa_engine.cpp - MITalk coarticulation, pitch mode selection
  • pack.h/pack.cpp - Fujisaki and coarticulation parameters
  • phonemes.yaml - Velar stops, Hungarian palatals and vowels
  • hu.yaml - Palatal off-glide rules
  • en-gb.yaml - UK vowel routing

Acknowledgments

  • Rommix - Fujisaki model implementation, IIR filter design, and extensive pitch contour tuning. The natural prosody in this release is thanks to his work! As well as helping with coarticulation ideas, and what's the better approach.

Upgrading

This release maintains backward compatibility with existing language packs. New features are opt-in via YAML configuration. The Fujisaki pitch model can be enabled per-language with:

pitchMode: "fujisaki"

Don't miss a new TGSpeechBox release

NewReleases is sending notifications on new releases.