github tgeczy/TGSpeechBox v-200
NVSpeech Player with phoneme editor and NVDA Addon, Speech Dispatcher module version 2.0

latest releases: v-300, v-300rc2, v-300rc1...
one month ago

NV Speech Player Release Notes 2.0

We're calling this 2.0 because another voice change has occurred. People may or may not like it, but they sure can tweak it!

Highlights

  • Hybrid glottal source - Dramatically improved audio quality at 11025 Hz sample rate
  • VoicingTone v2 API - 3 new DSP parameters with backward-compatible versioning
  • Phoneme Editor voicing controls - Full UI for all 10 voicing parameters with per-voice storage
  • NVDA addon voicing sliders - New sliders in synth settings for voice quality adjustment

DSP Engine (speechPlayer.dll / libspeechPlayer.so)

Hybrid Glottal Source

The synthesizer now uses a sample-rate adaptive glottal waveform that selects the optimal pulse shape based on output rate:

Sample Rate Waveform Character
11025 Hz Symmetric cosine Fuller sound, no aliasing artifacts
16000+ Hz LF-inspired asymmetric Rich harmonics, Eloquence-like buzz

Why this matters: The previous LF-inspired pulse generated strong high-frequency harmonics that sounded great at 16 kHz but caused harsh aliasing artifacts at 11025 Hz (common for accessibility/low-bandwidth use). The hybrid approach gives you the best of both worlds.

VoicingTone v2 Struct

The VoicingTone API now supports 10 parameters (up from 7) with a version detection header for backward compatibility.

New Parameters

Parameter Description Range Default
noiseGlottalModDepth Klatt-style amplitude modulation of noise sources (aspiration/frication) synced to glottal cycle 0.0–1.0 0.0
pitchSyncF1DeltaHz F1 frequency shift during glottal open phase -60 to +60 Hz 0.0
pitchSyncB1DeltaHz B1 bandwidth widening during glottal open phase -50 to +50 Hz 0.0

The pitch-sync parameters model the acoustic coupling between glottal source and vocal tract, enabling Eloquence-like "buzzy clarity" effects.

Version Detection Header

The v2 struct includes a header for safe negotiation between callers and DLL:

typedef struct {
    uint32_t magic;         // 0x32544F56 ("VOT2")
    uint32_t structSize;    // sizeof(speechPlayer_voicingTone_t)
    uint32_t structVersion; // 2
    uint32_t dspVersion;    // 4
    // ... 10 double parameters ...
} speechPlayer_voicingTone_t;

Backward compatibility:

  • Old callers passing v1 struct (7 doubles, no header) → DLL detects missing magic, uses legacy layout
  • New callers with old DLL → DLL ignores unknown fields, uses what it understands
  • New speechPlayer_getDspVersion() export for runtime feature detection

NVDA Addon

New Voicing Tone Sliders

The synthesizer settings ring now includes 4 voicing parameter sliders:

Slider Range Default Effect
Voiced Tilt 0–100 50 Spectral slope (±24 dB/oct)
Noise Glottal Mod 0–100 0 Noise pulsing with voicing
Pitch-Sync F1 Delta 0–100 50 F1 modulation depth
Pitch-Sync B1 Delta 0–100 50 B1 modulation depth

Settings are saved per-voice in the NVDA configuration.

Driver Changes

  • Updated speechPlayer.py with VoicingTone v2 ctypes struct
  • this new NVDA driver will not sound good with Voicing Tone V1 DLLs, but it will not crash either.

Phoneme Editor (nvspPhonemeEditorWin32)

Voicing Parameters UI

The Speech Settings dialog now includes a Voicing Parameters section with:

  • Parameter listbox - All 10 voicing parameters
  • Value slider - 0–100 range mapped to each parameter's actual range
  • Reset button - Reset selected parameter to default
  • Reset All button - Reset all voicing parameters to defaults

Per-Voice Storage

Voicing parameters are saved separately for each voice:

  • Python preset voices (Adam, Benjamin, etc.) → Saved to INI file under [voice_Adam], [voice_Benjamin], etc.
  • YAML profile voices → Stored in the profile's voicingTone: section

Switching voices in the dropdown automatically loads that voice's saved voicing parameters.

3-Tier DLL Fallback

The editor gracefully handles different DLL versions:

DLL Exports Detection Behavior
setVoicingTone + getDspVersion V2 Send header + 10 params
setVoicingTone only V1 Send 7 params, no header
Neither None Skip VoicingTone, speech still works

Bug Fixes

  • Fixed crash when opening Voice Profiles dialog (missing function alias, struct field mismatches)
  • Fixed type conversion errors in profile editor YAML parsing

Linux Support

New Build Artifacts

  • libspeechPlayer.so - DSP engine shared library (x86_64)
  • libnvspFrontend.so - IPA-to-frames frontend shared library (x86_64)

Built with GCC C++17, -O2 -fPIC.


Don't miss a new TGSpeechBox release

NewReleases is sending notifications on new releases.