DrewThomasson/ebook2audiobook 25.2.18 on GitHub

CHANGELOG

version 25.2.18:

version structure is now based on YEAR.MONTH.PATCH_NUMBER
Now no need to have admin privileges on Windows to install ebook2audiobook packages (replaced chocolatey by scoop)
added MPS processor
added custom models dropdown list
added voices dropdown list and play button to listen each of them
added voice extractor for upload voices (separate vocals from background and music)
added delete button for voices, custom models and audiobooks list
added builtin voices to the voices list and can be used for all TTS models
added "--output_dir" for custom output folder in headless mode
added directory options for ebook upload batch files in gradio/gui mode
added new output audio format ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'].
More can be added on demand.
added running conversion cancellation via the ebook upload gradio component (when the "X" is clicked)
new global config settings:
tmp_expire = for inactive session before cleanup, in days
max_custom_model: max custom model on list (by session id)
max_custom_voices: max custom voice on list (by session id)
tts_default_settings: fine tuned XTTS default parameters
(refer to ./lib/conf.py for all new configuration settings)
gradio GUI settings are now saved and restored on refresh and browser exit
resume conversion in headless and gradio GUI mode, when client page/connection lost or reloaded
(however the user should restart the process manually with the same session id)
Math symbols and numbers to phonemes are now on all TTS engines
(non covered languages are pronounced with the default_language_code set in ./lib/conf.py.
PR are welcome to fix missing translations)
audio filtering, normalization and improvement of all upload voices and final audiobook
to have the best sound presence and clarity.
fixed custom model upload
fixed missing pages in conversion
fixed modules and libraries missing during the installation (regex, mecab etc..)
various gradio design improvements
optimized multi language sentence splitting to minimize hallucinations and unnatural pauses
now numbers and maths symbols are said for fairseq and XTTSv2
the TTS model is now loaded once in the script and for all users using the same model
added coqui-tts built-in voices for all TTS engines and as standard in all languages
added new modal alerts for info, error, exception and warnings
removed docker_utils which was a docker with ffmpeg and calibre only
removed fine tuned parameters as it caused worse results than better
optimized sentences splitting
Many more fixes and new features, but don't remember all.... see by yourself ;)

Currently in development:

added Terminal output console to gradio/gui
implement more TTS engines (list not decided yet)
apprise notification
implement chapter summarizing to create background music and sounds
implement indices in the metadata for each sentence in the final file
to eventually improve the pronounciation and replace it with the new sentence.
add built-in voice list of xttsv2
add czhech, croatian and others with cv/vits
add music interlude between chapters
adding chapters name (if chapters well detected) in place of number in the final metadata
split the output in multiple file if > 12hours # chapters as final
installation of the right torch and cuda version if GPU available so deepspeed can be used
automatic user crash bug report by email via a URL request
create a legends.py file for all gradio/gui legends to manage multilanguage
mark each sentence number in the metadata with the timecode so
the user would be able to re*convert one sentence before to export the audiobook
(it requires to not delete the ebook temp folder)
use "websocat" in "cmd.exe" and "bash/zsh" script to connect in headless mode via gradio and avoid tts load at each command

DrewThomasson/ebook2audiobook 25.2.18 V25.2.18 on GitHub

Currently in development:

DrewThomasson/ebook2audiobook 25.2.18
V25.2.18

on GitHub