github VisoMasterFusion/VisoMaster-Fusion v2.0.0
VisoMaster-Fusion - v2.0.0

6 hours ago

VisoMaster Fusion 2.0.0 — Release Notes

What's New

This release represents a major overhaul of VisoMaster Fusion, bringing new features, significant performance improvements, extensive bug fixes, and a more polished UI across the board.
Thanks to all contributors !

New Features

Face Re-Aging

A new Face Re-Aging processor lets you transform the apparent age of a face in real-time. The age can be adjusted from 0–100 using a dedicated slider. The underlying ONNX model runs with device-aware FP16 inference and includes normalization guards and CPU fallback for compatibility.

Auto Mouth Expression (Mouth Action Detector)

Replaced the previous mouth-open toggle with a full Mouth Action Detector — a TensorFlow-based model that automatically detects mouth movement states with hysteresis to prevent rapid oscillation. Supports both 68-point and 203-point landmark formats, includes occlusion timeout handling, EMA decay, and is fully thread-safe.

ByteTrack Multi-Face Tracking

Integrated ByteTrack — a state-of-the-art multi-object tracking algorithm with Kalman filter-based motion prediction — to greatly improve temporal consistency in face detection across frames.

VR180 Tiled Face Detection

Added a new _detect_faces_vr_tiled() pipeline that splits the equirectangular frame into up to 24 perspective crops before detection, dramatically improving detection of faces at the edges or in challenging VR180 layouts. New settings:

  • VR180TileDetectionToggle (default: on)
  • VR180MaxFOVSlider (60–150°, default: 120°)
  • VR180CropResolutionSelection (512 / 768 / 1024, default: 512)
  • VR180EyeModeSelection — Both Eyes / Single Eye

Theatre / Fullscreen Mode

Added a dedicated Theatre Mode with optional true fullscreen support, persistent state snapshots, and smooth transitions. Controlled via TheatreModeUsesFullscreenToggle.

Output Location Controls

New output settings section:

  • Output to Target Location — saves output next to the source media file
  • Cluster Output by Source — organizes output into subfolders by source name

Global Input Resize

Added GlobalInputResizeToggle and GlobalInputResizeSizeSelection (540p–2160p) to rescale input before processing, enabling better throughput on high-resolution sources.

New UI Themes

Three new QSS themes:

  • Dark Blue — cool dark interface with blue accents
  • OLED Black — true black for OLED displays
  • Windows 11 Dark — matches Windows 11 system dark aesthetic

Collapsible Parameter Sections

Parameter panels now support persistent collapsible sections, reducing clutter and keeping the workspace focused.


Bug Fixes

Video Playback & Seeking

  • Fixed seeking lag; added FrameWorkerDelayDecimalSlider (0–3s) to prevent GPU overload during seeks
  • Fixed playback buffering issues and memory bloat from unbounded display frame queues
  • Fixed stale preview frame when switching between target videos

Audio / Video Sync

  • Fixed skipped-frame audio rebuild with a codec-agnostic approach
  • Fixed audio segmenting in default-style recording mode
  • Fixed audio merge cleanup and timing issues
  • Made skipped-frame audio rebuild fully sync-aware

VR180 Pipeline

  • Added torch.clamp() around asin() argument — prevents NaN when coordinates fall outside valid FOV
  • Changed out-of-FOV grid coordinates from 2.0 → 1.0
  • Changed padding mode from zerosborder to eliminate color fringing at equirectangular edges
  • Fixed align_corners from FalseTrue for correct grid sampling
  • Scaled feather_radius dynamically based on _mask_region_side
  • Added single-eye detection and landmark fallback
  • Added warning log for silently dropped faces

Face Detection & Landmarks

  • Fixed detector input size reading (now uses model-declared input shape instead of hardcoded values)
  • Fixed KPS resize and tracking regressions
  • Separated 203-point KPS model; updated input resize for better upscale results
  • Added _is_kps_valid() guard to skip invalid landmark sets
  • Added guard on kps_5_adj.shape[0] >= 5
  • Fixed grayscale channel check (channel == 2channel == 1)
  • Fixed auto-rotation landmark refinement for upside-down faces

Issue Scan System

  • Stabilized issue scan processor state and worker startup/shutdown lifecycle
  • Fixed issue scan UI restore behavior and range normalization
  • Blocked structural UI mutations during active scans
  • Kept issue scan progress tooltip static during scans
  • Added stacktrace logging to issue scanner errors
  • Reduced dense landmark smoothing warning spam
  • Wired issue scan UI to marker-resolved runtime state

Expression Restorer & Re-Aging

  • Fixed ByteTrack + Expression Restorer interaction
  • Fixed re-aging edge cases and dtype/scale normalization at entry point
  • DFM output scaling corrected (scaled ×255)
  • Fixed DFM FW-QUAL-08 threshold (1.0 → 30.0)

Numerical / Safety Guards

  • Added NaN/Inf guards after every from_estimate() call
  • Added None/NaN guard for GhostFace M similarity transform matrix
  • Fixed memory total > 0 guard against division by zero in VRAM reporting
  • Removed tautology in preset application (face_id == face_id)

UI Fixes

  • Fixed "Open file location" for both Target and Input panels
  • Fixed eye blend normalize behavior and toggle UX
  • Fixed webcam restore and denoiser row visibility
  • Fixed fullscreen workspace restore and theatre state snapshots
  • Fixed GFPGAN model load issues

Performance Improvements

Frame Worker

  • v2.Resize caching: Resize objects cached by (h, w, interpolation, antialias) — eliminates repeated allocations per frame
  • Border mask skip: Border mask computation skipped when BordermaskEnableToggle is off
  • Lazy snapshot clone: Restoration-1 before snapshot clone made lazy
  • PerspectiveConverter skip: Skipped when no VR crops exist
  • FFT/pooling skip: analyze_image FFT/pooling skipped when debug mode is off
  • Deepcopy reduction: Replaced deepcopy with shallow copy in single-frame mode
  • Reuse: FrameEnhancers and FrameEdits reused from models_processor instead of being recreated per worker

Cache Optimization

  • _transform_cache converted to OrderedDict LRU (max 32)
  • _resize_cache converted to OrderedDict LRU (max 16)
  • _ddim_schedule_cache converted to OrderedDict LRU (max 20)
  • MAX_GABOR_CACHE reduced from 32 → 16
  • EquirectangularConverter cached at FrameWorker level

Memory Management

  • Added torch.cuda.empty_cache() to DFM FIFO eviction
  • Added gc.collect() + torch.cuda.empty_cache() after thread cleanup
  • Cleared _seek_cached_frame on stop_processing()
  • Cleared KV maps and embeddings before deleteLater() in panel removal
  • Added stale-entry pruning to track_history
  • VR crop tensors deleted immediately after stitching
  • Trimmed _color_stats_ema to active faces only

UI / UX Improvements

  • Elapsed time shown beside seek controls
  • Folder shortcuts and standard directory icons in file dialogs
  • Shared face thumbnail size toggle with smaller default
  • Target media thumbnail improvements and filter control refinements
  • Input faces path controls aligned with target media layout
  • Mouse wheel support for parameter widgets (opt-in)
  • Embedding overwrite confirmation before saving over an existing file
  • Recording stop confirmation toggle (ConfirmBeforeStoppingRecordingToggle)
  • Clear-all context actions added to media face and embedding panels
  • Label target faces and polished parameter action buttons
  • Collapsed hidden parameter rows to reduce UI noise
  • Removed deprecated mask and settings controls

New / Updated Settings

Setting Description
GlobalInputResizeToggle Rescale input before processing
GlobalInputResizeSizeSelection Target resolution (540p–2160p)
FrameWorkerDelayDecimalSlider Frame worker delay 0–3s
FrameSkipStepSlider Frame skip count 1–300
VR180EyeModeSelection Both Eyes / Single Eye
VR180TileDetectionToggle VR tiled face detection
VR180MaxFOVSlider FOV range 60–150°
VR180CropResolutionSelection Crop resolution 512/768/1024
OutputToTargetLocationToggle Save output next to source media
ClusterOutputBySourceToggle Cluster output by source name
TheatreModeUsesFullscreenToggle Theatre mode uses fullscreen
ConfirmBeforeStoppingRecordingToggle Confirm before stopping recording

Removed / Deprecated

  • CommandLineDebugEnableToggle removed
  • Deprecated mask and settings controls removed from UI
  • LivePortrait TensorRT model list removed (lazy build used instead)
  • Auto mouth expression updated from mouth-open detection to full mouth-action model
  • Custom provider kernel branch separated into standalone test branch (test-custom-kernels)

Documentation

  • Added Quick Start Guide (docs/quickstart.md)
  • Added User Manual (docs/user_manual.md)
  • Updated README with feature descriptions and installation guidance
  • Added job manager workflow documentation

Testing

27+ new test files added covering:

  • Face re-aging dtype handling and logic
  • Mouth openness state machine
  • Frame worker quality fixes (DFM scale, keypoints guard)
  • VR180 pipeline (unit + integration)
  • Audio rebuild and sync
  • Issue scan progress and UI lifecycle
  • Job manager and save/load actions
  • Panel clear-all reset helpers

Don't miss a new VisoMaster-Fusion release

NewReleases is sending notifications on new releases.