VisoMaster Fusion 2.0.0 — Release Notes

What's New

This release represents a major overhaul of VisoMaster Fusion, bringing new features, significant performance improvements, extensive bug fixes, and a more polished UI across the board.
Thanks to all contributors !

New Features

Face Re-Aging

A new Face Re-Aging processor lets you transform the apparent age of a face in real-time. The age can be adjusted from 0–100 using a dedicated slider. The underlying ONNX model runs with device-aware FP16 inference and includes normalization guards and CPU fallback for compatibility.

Auto Mouth Expression (Mouth Action Detector)

Replaced the previous mouth-open toggle with a full Mouth Action Detector — a TensorFlow-based model that automatically detects mouth movement states with hysteresis to prevent rapid oscillation. Supports both 68-point and 203-point landmark formats, includes occlusion timeout handling, EMA decay, and is fully thread-safe.

ByteTrack Multi-Face Tracking

Integrated ByteTrack — a state-of-the-art multi-object tracking algorithm with Kalman filter-based motion prediction — to greatly improve temporal consistency in face detection across frames.

VR180 Tiled Face Detection

Added a new _detect_faces_vr_tiled() pipeline that splits the equirectangular frame into up to 24 perspective crops before detection, dramatically improving detection of faces at the edges or in challenging VR180 layouts. New settings:

VR180TileDetectionToggle (default: on)
VR180MaxFOVSlider (60–150°, default: 120°)
VR180CropResolutionSelection (512 / 768 / 1024, default: 512)
VR180EyeModeSelection — Both Eyes / Single Eye

Theatre / Fullscreen Mode

Added a dedicated Theatre Mode with optional true fullscreen support, persistent state snapshots, and smooth transitions. Controlled via TheatreModeUsesFullscreenToggle.

Output Location Controls

New output settings section:

Output to Target Location — saves output next to the source media file
Cluster Output by Source — organizes output into subfolders by source name

Global Input Resize

Added GlobalInputResizeToggle and GlobalInputResizeSizeSelection (540p–2160p) to rescale input before processing, enabling better throughput on high-resolution sources.

New UI Themes

Three new QSS themes:

Dark Blue — cool dark interface with blue accents
OLED Black — true black for OLED displays
Windows 11 Dark — matches Windows 11 system dark aesthetic

Collapsible Parameter Sections

Parameter panels now support persistent collapsible sections, reducing clutter and keeping the workspace focused.

Bug Fixes

Video Playback & Seeking

Fixed seeking lag; added FrameWorkerDelayDecimalSlider (0–3s) to prevent GPU overload during seeks
Fixed playback buffering issues and memory bloat from unbounded display frame queues
Fixed stale preview frame when switching between target videos

Audio / Video Sync

Fixed skipped-frame audio rebuild with a codec-agnostic approach
Fixed audio segmenting in default-style recording mode
Fixed audio merge cleanup and timing issues
Made skipped-frame audio rebuild fully sync-aware

VR180 Pipeline

Added torch.clamp() around asin() argument — prevents NaN when coordinates fall outside valid FOV
Changed out-of-FOV grid coordinates from 2.0 → 1.0
Changed padding mode from zeros → border to eliminate color fringing at equirectangular edges
Fixed align_corners from False → True for correct grid sampling
Scaled feather_radius dynamically based on _mask_region_side
Added single-eye detection and landmark fallback
Added warning log for silently dropped faces

Face Detection & Landmarks

Fixed detector input size reading (now uses model-declared input shape instead of hardcoded values)
Fixed KPS resize and tracking regressions
Separated 203-point KPS model; updated input resize for better upscale results
Added _is_kps_valid() guard to skip invalid landmark sets
Added guard on kps_5_adj.shape[0] >= 5
Fixed grayscale channel check (channel == 2 → channel == 1)
Fixed auto-rotation landmark refinement for upside-down faces

Issue Scan System

Stabilized issue scan processor state and worker startup/shutdown lifecycle
Fixed issue scan UI restore behavior and range normalization
Blocked structural UI mutations during active scans
Kept issue scan progress tooltip static during scans
Added stacktrace logging to issue scanner errors
Reduced dense landmark smoothing warning spam
Wired issue scan UI to marker-resolved runtime state

Expression Restorer & Re-Aging

Fixed ByteTrack + Expression Restorer interaction
Fixed re-aging edge cases and dtype/scale normalization at entry point
DFM output scaling corrected (scaled ×255)
Fixed DFM FW-QUAL-08 threshold (1.0 → 30.0)

Numerical / Safety Guards

Added NaN/Inf guards after every from_estimate() call
Added None/NaN guard for GhostFace M similarity transform matrix
Fixed memory total > 0 guard against division by zero in VRAM reporting
Removed tautology in preset application (face_id == face_id)

UI Fixes

Fixed "Open file location" for both Target and Input panels
Fixed eye blend normalize behavior and toggle UX
Fixed webcam restore and denoiser row visibility
Fixed fullscreen workspace restore and theatre state snapshots
Fixed GFPGAN model load issues

Performance Improvements

Frame Worker

v2.Resize caching: Resize objects cached by (h, w, interpolation, antialias) — eliminates repeated allocations per frame
Border mask skip: Border mask computation skipped when BordermaskEnableToggle is off
Lazy snapshot clone: Restoration-1 before snapshot clone made lazy
PerspectiveConverter skip: Skipped when no VR crops exist
FFT/pooling skip: analyze_image FFT/pooling skipped when debug mode is off
Deepcopy reduction: Replaced deepcopy with shallow copy in single-frame mode
Reuse: FrameEnhancers and FrameEdits reused from models_processor instead of being recreated per worker

Cache Optimization

_transform_cache converted to OrderedDict LRU (max 32)
_resize_cache converted to OrderedDict LRU (max 16)
_ddim_schedule_cache converted to OrderedDict LRU (max 20)
MAX_GABOR_CACHE reduced from 32 → 16
EquirectangularConverter cached at FrameWorker level

Memory Management

Added torch.cuda.empty_cache() to DFM FIFO eviction
Added gc.collect() + torch.cuda.empty_cache() after thread cleanup
Cleared _seek_cached_frame on stop_processing()
Cleared KV maps and embeddings before deleteLater() in panel removal
Added stale-entry pruning to track_history
VR crop tensors deleted immediately after stitching
Trimmed _color_stats_ema to active faces only

UI / UX Improvements

Elapsed time shown beside seek controls
Folder shortcuts and standard directory icons in file dialogs
Shared face thumbnail size toggle with smaller default
Target media thumbnail improvements and filter control refinements
Input faces path controls aligned with target media layout
Mouse wheel support for parameter widgets (opt-in)
Embedding overwrite confirmation before saving over an existing file
Recording stop confirmation toggle (ConfirmBeforeStoppingRecordingToggle)
Clear-all context actions added to media face and embedding panels
Label target faces and polished parameter action buttons
Collapsed hidden parameter rows to reduce UI noise
Removed deprecated mask and settings controls

New / Updated Settings

Setting	Description
`GlobalInputResizeToggle`	Rescale input before processing
`GlobalInputResizeSizeSelection`	Target resolution (540p–2160p)
`FrameWorkerDelayDecimalSlider`	Frame worker delay 0–3s
`FrameSkipStepSlider`	Frame skip count 1–300
`VR180EyeModeSelection`	Both Eyes / Single Eye
`VR180TileDetectionToggle`	VR tiled face detection
`VR180MaxFOVSlider`	FOV range 60–150°
`VR180CropResolutionSelection`	Crop resolution 512/768/1024
`OutputToTargetLocationToggle`	Save output next to source media
`ClusterOutputBySourceToggle`	Cluster output by source name
`TheatreModeUsesFullscreenToggle`	Theatre mode uses fullscreen
`ConfirmBeforeStoppingRecordingToggle`	Confirm before stopping recording

Removed / Deprecated

CommandLineDebugEnableToggle removed
Deprecated mask and settings controls removed from UI
LivePortrait TensorRT model list removed (lazy build used instead)
Auto mouth expression updated from mouth-open detection to full mouth-action model
Custom provider kernel branch separated into standalone test branch (test-custom-kernels)

Documentation

Added Quick Start Guide (docs/quickstart.md)
Added User Manual (docs/user_manual.md)
Updated README with feature descriptions and installation guidance
Added job manager workflow documentation

Testing

27+ new test files added covering:

Face re-aging dtype handling and logic
Mouth openness state machine
Frame worker quality fixes (DFM scale, keypoints guard)
VR180 pipeline (unit + integration)
Audio rebuild and sync
Issue scan progress and UI lifecycle
Job manager and save/load actions
Panel clear-all reset helpers

VisoMasterFusion/VisoMaster-Fusion v2.0.0 VisoMaster-Fusion - v2.0.0 on GitHub