VisoMaster Fusion 2.0.0 — Release Notes
What's New
This release represents a major overhaul of VisoMaster Fusion, bringing new features, significant performance improvements, extensive bug fixes, and a more polished UI across the board.
Thanks to all contributors !
New Features
Face Re-Aging
A new Face Re-Aging processor lets you transform the apparent age of a face in real-time. The age can be adjusted from 0–100 using a dedicated slider. The underlying ONNX model runs with device-aware FP16 inference and includes normalization guards and CPU fallback for compatibility.
Auto Mouth Expression (Mouth Action Detector)
Replaced the previous mouth-open toggle with a full Mouth Action Detector — a TensorFlow-based model that automatically detects mouth movement states with hysteresis to prevent rapid oscillation. Supports both 68-point and 203-point landmark formats, includes occlusion timeout handling, EMA decay, and is fully thread-safe.
ByteTrack Multi-Face Tracking
Integrated ByteTrack — a state-of-the-art multi-object tracking algorithm with Kalman filter-based motion prediction — to greatly improve temporal consistency in face detection across frames.
VR180 Tiled Face Detection
Added a new _detect_faces_vr_tiled() pipeline that splits the equirectangular frame into up to 24 perspective crops before detection, dramatically improving detection of faces at the edges or in challenging VR180 layouts. New settings:
VR180TileDetectionToggle(default: on)VR180MaxFOVSlider(60–150°, default: 120°)VR180CropResolutionSelection(512 / 768 / 1024, default: 512)VR180EyeModeSelection— Both Eyes / Single Eye
Theatre / Fullscreen Mode
Added a dedicated Theatre Mode with optional true fullscreen support, persistent state snapshots, and smooth transitions. Controlled via TheatreModeUsesFullscreenToggle.
Output Location Controls
New output settings section:
- Output to Target Location — saves output next to the source media file
- Cluster Output by Source — organizes output into subfolders by source name
Global Input Resize
Added GlobalInputResizeToggle and GlobalInputResizeSizeSelection (540p–2160p) to rescale input before processing, enabling better throughput on high-resolution sources.
New UI Themes
Three new QSS themes:
- Dark Blue — cool dark interface with blue accents
- OLED Black — true black for OLED displays
- Windows 11 Dark — matches Windows 11 system dark aesthetic
Collapsible Parameter Sections
Parameter panels now support persistent collapsible sections, reducing clutter and keeping the workspace focused.
Bug Fixes
Video Playback & Seeking
- Fixed seeking lag; added
FrameWorkerDelayDecimalSlider(0–3s) to prevent GPU overload during seeks - Fixed playback buffering issues and memory bloat from unbounded display frame queues
- Fixed stale preview frame when switching between target videos
Audio / Video Sync
- Fixed skipped-frame audio rebuild with a codec-agnostic approach
- Fixed audio segmenting in default-style recording mode
- Fixed audio merge cleanup and timing issues
- Made skipped-frame audio rebuild fully sync-aware
VR180 Pipeline
- Added
torch.clamp()aroundasin()argument — prevents NaN when coordinates fall outside valid FOV - Changed out-of-FOV grid coordinates from 2.0 → 1.0
- Changed padding mode from
zeros→borderto eliminate color fringing at equirectangular edges - Fixed
align_cornersfromFalse→Truefor correct grid sampling - Scaled
feather_radiusdynamically based on_mask_region_side - Added single-eye detection and landmark fallback
- Added warning log for silently dropped faces
Face Detection & Landmarks
- Fixed detector input size reading (now uses model-declared input shape instead of hardcoded values)
- Fixed KPS resize and tracking regressions
- Separated 203-point KPS model; updated input resize for better upscale results
- Added
_is_kps_valid()guard to skip invalid landmark sets - Added guard on
kps_5_adj.shape[0] >= 5 - Fixed grayscale channel check (
channel == 2→channel == 1) - Fixed auto-rotation landmark refinement for upside-down faces
Issue Scan System
- Stabilized issue scan processor state and worker startup/shutdown lifecycle
- Fixed issue scan UI restore behavior and range normalization
- Blocked structural UI mutations during active scans
- Kept issue scan progress tooltip static during scans
- Added stacktrace logging to issue scanner errors
- Reduced dense landmark smoothing warning spam
- Wired issue scan UI to marker-resolved runtime state
Expression Restorer & Re-Aging
- Fixed ByteTrack + Expression Restorer interaction
- Fixed re-aging edge cases and dtype/scale normalization at entry point
- DFM output scaling corrected (scaled ×255)
- Fixed DFM FW-QUAL-08 threshold (1.0 → 30.0)
Numerical / Safety Guards
- Added NaN/Inf guards after every
from_estimate()call - Added None/NaN guard for GhostFace M similarity transform matrix
- Fixed memory total > 0 guard against division by zero in VRAM reporting
- Removed tautology in preset application (
face_id == face_id)
UI Fixes
- Fixed "Open file location" for both Target and Input panels
- Fixed eye blend normalize behavior and toggle UX
- Fixed webcam restore and denoiser row visibility
- Fixed fullscreen workspace restore and theatre state snapshots
- Fixed GFPGAN model load issues
Performance Improvements
Frame Worker
- v2.Resize caching: Resize objects cached by (h, w, interpolation, antialias) — eliminates repeated allocations per frame
- Border mask skip: Border mask computation skipped when
BordermaskEnableToggleis off - Lazy snapshot clone: Restoration-1 before snapshot clone made lazy
- PerspectiveConverter skip: Skipped when no VR crops exist
- FFT/pooling skip:
analyze_imageFFT/pooling skipped when debug mode is off - Deepcopy reduction: Replaced deepcopy with shallow copy in single-frame mode
- Reuse: FrameEnhancers and FrameEdits reused from
models_processorinstead of being recreated per worker
Cache Optimization
_transform_cacheconverted to OrderedDict LRU (max 32)_resize_cacheconverted to OrderedDict LRU (max 16)_ddim_schedule_cacheconverted to OrderedDict LRU (max 20)MAX_GABOR_CACHEreduced from 32 → 16EquirectangularConvertercached at FrameWorker level
Memory Management
- Added
torch.cuda.empty_cache()to DFM FIFO eviction - Added
gc.collect()+torch.cuda.empty_cache()after thread cleanup - Cleared
_seek_cached_frameonstop_processing() - Cleared KV maps and embeddings before
deleteLater()in panel removal - Added stale-entry pruning to
track_history - VR crop tensors deleted immediately after stitching
- Trimmed
_color_stats_emato active faces only
UI / UX Improvements
- Elapsed time shown beside seek controls
- Folder shortcuts and standard directory icons in file dialogs
- Shared face thumbnail size toggle with smaller default
- Target media thumbnail improvements and filter control refinements
- Input faces path controls aligned with target media layout
- Mouse wheel support for parameter widgets (opt-in)
- Embedding overwrite confirmation before saving over an existing file
- Recording stop confirmation toggle (
ConfirmBeforeStoppingRecordingToggle) - Clear-all context actions added to media face and embedding panels
- Label target faces and polished parameter action buttons
- Collapsed hidden parameter rows to reduce UI noise
- Removed deprecated mask and settings controls
New / Updated Settings
| Setting | Description |
|---|---|
GlobalInputResizeToggle
| Rescale input before processing |
GlobalInputResizeSizeSelection
| Target resolution (540p–2160p) |
FrameWorkerDelayDecimalSlider
| Frame worker delay 0–3s |
FrameSkipStepSlider
| Frame skip count 1–300 |
VR180EyeModeSelection
| Both Eyes / Single Eye |
VR180TileDetectionToggle
| VR tiled face detection |
VR180MaxFOVSlider
| FOV range 60–150° |
VR180CropResolutionSelection
| Crop resolution 512/768/1024 |
OutputToTargetLocationToggle
| Save output next to source media |
ClusterOutputBySourceToggle
| Cluster output by source name |
TheatreModeUsesFullscreenToggle
| Theatre mode uses fullscreen |
ConfirmBeforeStoppingRecordingToggle
| Confirm before stopping recording |
Removed / Deprecated
CommandLineDebugEnableToggleremoved- Deprecated mask and settings controls removed from UI
- LivePortrait TensorRT model list removed (lazy build used instead)
- Auto mouth expression updated from mouth-open detection to full mouth-action model
- Custom provider kernel branch separated into standalone test branch (
test-custom-kernels)
Documentation
- Added Quick Start Guide (
docs/quickstart.md) - Added User Manual (
docs/user_manual.md) - Updated README with feature descriptions and installation guidance
- Added job manager workflow documentation
Testing
27+ new test files added covering:
- Face re-aging dtype handling and logic
- Mouth openness state machine
- Frame worker quality fixes (DFM scale, keypoints guard)
- VR180 pipeline (unit + integration)
- Audio rebuild and sync
- Issue scan progress and UI lifecycle
- Job manager and save/load actions
- Panel clear-all reset helpers