A fast follow-up to 0.4.0 focused on making the new engines actually load in the production binary — plus generation cancellation, Linux system-audio capture, and the repo's first PR-time type check. Five first-time contributors shipped in this release.
0.4.0 introduced three new TTS engines, but the frozen PyInstaller binary tripped over several Python-ecosystem quirks that don't show up in the dev venv: transformers opening .py sources at runtime, scipy.stats._distn_infrastructure hitting a frozen-importer NameError, and chatterbox-multilingual failing to find its Chinese segmenter dictionary. This release patches all of those in one sweep.
Frozen-Binary Reliability (#438)
- Kokoro now bundles
.pysources alongside.pycvia--collect-all kokorosotransformers'_can_set_attn_implementationregex scan can read them — previouslyFileNotFoundError: kokoro/modules.pykilled Kokoro loading in production builds - Chatterbox Multilingual now bundles
spacy_pkuseg/dicts/default.pkland the package's native.soextensions via--collect-all spacy_pkuseg— previously the Chinese word segmenter crashed withFileNotFoundErroron first load - scipy.stats._distn_infrastructure — new runtime hook source-patches the trailing
del obj(which raisesNameErrorunder PyInstaller's frozen importer because the preceding list comprehension evaluates empty) toglobals().pop('obj', None), unblockinglibrosa→scipy.signal→scipy.statsfor every TTS engine that depends on librosa - transformers.masking_utils — same runtime hook forces
_is_torch_greater_or_equal_than_2_6 = Falseso the oldersdpa_mask_older_torchpath is selected; the 2.6+ path usesTransformGetItemToIndex(), a realtorch._dynamograph transform our permissive stub can't reproduce - torch._dynamo — no-op stub replaces the real module before
transformersimports it, preventing thetorch._numpy._ufuncsimport crash (NameError: name 'name' is not defined) that blocked Kokoro and every engine pulling inflex_attention .specpaths are now repo-relative instead of absolute, so the generated spec is portable across machines and CI
Generation
- Cancel queued or running generations (#444) — new
/generate/{id}/cancelendpoint and a Stop button on the history row while generating. The serial queue now tracks per-ID state (queued / running / cancelled) so queued jobs are skipped before the worker picks them up and running jobs are.cancel()-ed mid-flight;run_generationcatchesCancelledErrorand marks the rowfailedwith a "cancelled" error. - Legacy
data/path prefix resolution (#440) — generations stored with the olddata/prefix under pre-0.4 installs now resolve correctly after the storage root moved, fixing 404s for historical audio.
Model Migration
- Migration dialog no longer hangs when the cache is empty (#439) — the backend now emits a completion SSE event even when zero models are moved.
- Storage-change flow surfaces a toast when there's nothing to migrate (#433) instead of proceeding with a no-op move and restarting the server.
- Deleting all generations from a voice profile now deletes the associated version files and DB rows too (#447) — previously orphaned versions accumulated in storage.
Platform
- Linux system audio capture (#457) —
cpal's ALSA backend doesn't expose PulseAudio/PipeWire monitor sources by name, so the previous device-name search never matched and silently fell back to the microphone. Detection now usespactl get-default-sink+pactl list short sourcesand routes viaPULSE_SOURCE, with the name-based search retained as a fallback whenpactlis absent.
Frontend CI
- First PR-time quality gate (#418) — new
.github/workflows/ci.ymlrunsbun run typecheck+bun run build:webon every PR. Fixed pre-existing type issues that were being suppressed with@ts-expect-error, cleaned up a dep-array typo ([platform.metadata.isTauricheckOnMountcheckForUpdates]) inuseAutoUpdater, and removed 100+ lines of deadModelItemcode fromModelManagement.tsx. - Follow-up: widened
apiClient.migrateModels()return type to includemovedanderrorsso the storage-change handler typechecks against the real backend response (#470).
Docs
- Clarified in the Quick Start + README that paralinguistic tags (
[laugh],[sigh]) only work with Chatterbox Turbo; other engines read them as literal text (#450).
New Contributors
- @Bortlesboat — generation cancellation (#444)
- @gaojulong — migration dialog hang fix (#439)
- @fuleinist — migration no-op toast (#433)
- @erionjuniordeandrade-a11y — frontend CI + type hardening (#418)
- @estefrac — Linux pactl system-audio capture (#457)