prakharsr/audiobook-creator v1.4 on GitHub

🚀 Audiobook Creator v1.4 Release 🎧

This release adds support for Orpheus TTS which supports high-quality audio, more expressive speech and support for adding emotion tags using an LLM. Audio generation using Orpheus is done using my dedicated Orpheus TTS FastAPI Server repository.

The Orpheus TTS FastAPI server is a high-performance FastAPI-based server that provides OpenAI-compatible Text-to-Speech (TTS) endpoints using the Orpheus TTS model with async parallel processing. This project uses the original orpheus-speech python package with vLLM backend, loading the model in bfloat16 by default (with float16/float32 options). Using higher precision formats requires more VRAM but eliminates audio quality issues and artifacts commonly found in quantized models or alternative inference engines. The server supports async parallel chunk processing for significantly faster audio generation. It also introduces features to fix audio quality issues commonly found while generating audio with Orpheus:

Intelligent Retry Logic: Automatic retry on audio decoding errors for improved reliability
Token Repetition Detection: Prevents infinite audio loops with adaptive pattern detection and automatic retry with adjusted parameters

Instructions for new/ old users:

Since this release introduces several changes to support Orpheus, its highly recommended to go through the full README.md and setup the application.
The release introduces changes to the .env file so its recommended to update your .env with the updated variables from .env.sample.

🐳 Docker Image:

You can pull the latest image with (choose cpu/ cuda gpu variant):

docker pull ghcr.io/prakharsr/audiobook_creator_cpu:v1.4

docker pull ghcr.io/prakharsr/audiobook_creator_gpu:v1.4

For complete instructions on how to run: Goto the Get Started Section

Full Changelog: v1.3...v1.4