🐛 Bug Fixes
Model Download Timeout Issues
Fixed critical issue where model downloads would fail with "Failed to fetch" errors on Windows:
- Root Cause: Multi-GB model downloads exceeded HTTP request timeout (30-60s), causing frontend to show errors even though downloads were continuing in background
- Solution: Refactored download endpoints to return immediately and continue downloads in background
/models/downloadendpoint now returns instantly with download starting in background/generateand/transcribeendpoints now auto-start model downloads when needed- Returns 202 Accepted status with download progress information for better UX
- Frontend can track download progress via SSE endpoint and retry when complete
Cross-Platform Cache Path Issues
- Fixed hardcoded
~/.cache/huggingface/hubpaths that don't work on Windows - All cache paths now use
hf_constants.HF_HUB_CACHEfor proper cross-platform support - Windows: Uses
%USERPROFILE%\.cache\huggingface\hubor%LOCALAPPDATA% - macOS/Linux: Uses
~/.cache/huggingface/hub - Ensures HuggingFace cache directory exists on startup (defensive fix)
✨ Features
Windows Process Management
- Added
/shutdownendpoint for graceful server shutdown on Windows - Improved process lifecycle management for bundled server binary
GPU Detection Improvements
- Added
gpu_typefield to health check response - Now shows specific GPU type: "CUDA (GPU Name)", "MPS (Apple Silicon)", or None
- Fixes UI showing "GPU: Not Available" when MPS/CUDA is actually detected