github suprepupre/wow-optimize v3.13.0
wow_optimize v3.13.0

5 hours ago

Release v3.13.0 — The Performance & Stability Milestone

This release brings together massive performance restorations, stability updates, and runtime enhancements developed since v3.11.0. Over 32 optimization features have been verified, stabilized, and activated, resulting in a significantly smoother and stutter-free experience.

Core Performance & Allocator Updates

  • mimalloc Allocator Redirect: Replaced WoW's statically-linked CRT memory allocator (malloc, free, realloc, calloc, _msize, and _recalloc) with Microsoft's high-performance mimalloc engine. This utilizes a custom transition guard and atomic activation to combat 32-bit virtual-address (VA) fragmentation during long play sessions, character swapping, or teleporting.
  • Adaptive Purge & VA-Pressure Governor: The memory manager dynamically tunes allocator purge delay based on OS virtual memory pressure (aggressive cleanup when largest free block is low, gentle otherwise to avoid recommit page-fault storms).
  • Wow-Internal strlen/memcpy/memset SSE2 Replacements: Inlined hand-written assembly replacements for critical CRT string and memory operations to avoid scalar bottlenecking in asset loading. Includes non-temporal streaming stores for copies >= 256 KB.
  • Free-Wrapper Fast Path: Directs deallocations to bypass redundant heap-walk overheads on one of the hottest paths in the binary.
  • Hook Enable Batching: Startup times have been improved by over 1.7 seconds by batching MinHook hooks during startup initialization via single-snapshot batch activation.
  • Removed Artificial Startup Sleep: Removed a synchronous Sleep(5000) call inside the main thread's DLL initialization sequence (DllMain) which was delaying game startup by exactly 5 seconds.
  • ReadFile Cache Lock Optimization (Shared SRWLock): Optimized the read-ahead cache lock in hooked_ReadFile. The lock is now acquired as a Shared SRWLock first to check the cache (the hot path for hits), and only upgraded to an Exclusive lock if there is a cache miss and we need to fetch a new read-ahead block. This completely eliminates lock contention when multiple threads are reading files concurrently.
  • Disabled Unsafe Async MPQ I/O and Background Prefetching: Gated all async prefetch (QueuePrefetch) and async read completion checks under the TEST_DISABLE_ASYNC_MPQ_IO flag in version.h and dllmain.cpp (and set it to 1 to disable it) to prevent background thread file pointer races and read corruptions on synchronous handles.
  • Optimized Read-Ahead Chunk Sizes: Reduced the read-ahead block sizes from 64KB/256KB to 16KB/64KB to minimize read amplification on random access patterns while keeping sequential header caching fast.

Lua VM & C-API Inlines

  • Safe Inline Caches & Stack Operations: Restored and verified the optimized inline paths for luaH_getstr (16384 entries with prefetch), lua_rawgeti (8192-entry array direct & hash cache), lua_toboolean, and lua_objlen matching engine byte layouts exactly.
  • Lua VM Inline Fast-Path Groups: Over 30 inline helpers (Safe Groups 1, 2, and 3) have been stabilized and activated (e.g. string.gsub plain-literal matching, math.fmod, math.modf, string.char, select, rawequal, strjoin, strsplit, etc.).
  • Adaptive GC Pacing: Lua garbage collection intervals scale dynamically depending on frametime limits and VA-pressure triggers.
  • SavedVariables Asynchronous Writer (Re-Enabled): Background writing of setting WTFs is active and stabilized using OS handle duplication, preventing settings truncation or handle recycling corruption.
  • Addon Lua Pre-Compiler (Re-Enabled): Prefetches and warms the OS page cache for addon files in the background, pausing during game loading screen transitions to avoid I/O pressure.
  • Sampling Profiler (Re-Enabled): Runs a background thread to sample CPU-heavy game loops and dumps top hotspots on process exit.

SIMD Geometry & Physics

  • Vectorized Möller-Trumbore Raycasting: Vectorized ray-triangle intersections (RayTriangleIntersect at 0x009836B0 and 0x00983490) using SSE2 cross/dot products, speeding up ground/terrain collision tests.
  • Thread-Safe SIMD Statistics Counters: Hardened physics, culling, and rotation statistics counters using atomic 32-bit InterlockedIncrement operations to prevent data races between render and async engine worker threads.
  • PE-Targeted x87 FPU Math Safety: Disabled unstable SSE2 transformations (Frustum culling, matrices, and quaternions) to guarantee 100% stable steering/collision behavior on high-framerate systems, completely fixing the camera steering jitter and associated 10-to-4 FPS drops.

Don't miss a new wow-optimize release

NewReleases is sending notifications on new releases.