CHANGELOG
version 25.2.0:
-
version structure is now based on YEAR.MONTH.PATCH_NUMBER
-
Now no need to have admin privileges on Windows to install ebook2audiobook packages (replaced chocolatey by scoop)
-
added MPS processor
-
added custom models dropdown list
-
added voices dropdown list and play button to listen each of them
-
added voice extractor for upload voices (separate vocals from background and music)
-
added delete button for voices, custom models and audiobooks list
-
added builtin voices to the voices list and can be used for all TTS models
-
added "--output_dir" for custom output folder in headless mode
-
added directory options for ebook upload batch files in gradio/gui mode
-
added new output audio format ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'].
More can be added on demand. -
added running conversion cancellation via the ebook upload gradio component (when the "X" is clicked)
-
new global config settings:
tmp_expire = for inactive session before cleanup, in days
max_custom_model: max custom model on list (by session id)
max_custom_voices: max custom voice on list (by session id)
tts_default_settings: fine tuned XTTS default parameters
(refer to ./lib/conf.py for all new configuration settings) -
gradio GUI settings are now saved and restored on refresh and browser exit
-
resume conversion in headless and gradio GUI mode, when client page/connection lost or reloaded
(however the user should restart the process manually with the same session id) -
Math symbols and numbers to phonemes are now on all TTS engines
(non covered languages are pronounced with the default_language_code set in ./lib/conf.py.
PR are welcome to fix missing translations) -
audio filtering, normalization and improvement of all upload voices and final audiobook
to have the best sound presence and clarity. -
fixed custom model upload
-
fixed missing pages in conversion
-
fixed modules and libraries missing during the installation (regex, mecab etc..)
-
various gradio design improvements
-
optimized multi language sentence splitting to minimize hallucinations and unnatural pauses
-
now numbers and maths symbols are said for fairseq and XTTSv2
-
the TTS model is now loaded once in the script and for all users using the same model
-
added coqui-tts built-in voices for all TTS engines and as standard in all languages
-
added new modal alerts for info, error, exception and warnings
-
removed docker_utils which was a docker with ffmpeg and calibre only
-
Many more fixes and new features, but don't remember all.... see by yourself ;)
Currently in development:
- added Terminal output console to gradio/gui
- implement more TTS engines (list not decided yet)
- apprise notification
- implement chapter summarizing to create background music and sounds
- implement indices in the metadata for each sentence in the final file
to eventually improve the pronounciation and replace it with the new sentence. - add built-in voice list of xttsv2
- add czhech, croatian and others with cv/vits
- add music interlude between chapters
- adding chapters name (if chapters well detected) in place of number in the final metadata
- split the output in multiple file if > 12hours # chapters as final
- installation of the right torch and cuda version if GPU available so deepspeed can be used
- automatic user crash bug report by email via a URL request
- create a legends.py file for all gradio/gui legends to manage multilanguage
- mark each sentence number in the metadata with the timecode so
the user would be able to re*convert one sentence before to export the audiobook
(it requires to not delete the ebook temp folder) - use "websocat" in "cmd.exe" and "bash/zsh" script to connect in headless mode via gradio and avoid tts load at each command