WoW Optimize Release Notes
This release ships a direct-threaded Lua bytecode interpreter that replaces World of Warcraft's switch-based opcode dispatch, plus six new optimization modules targeting rendering, math, logic, memory, and async workloads.
What's New
Lua VM Overhaul
- Direct-Threaded Interpreter: Features an 8192-site × 4-way inline cache that eliminates repeated hash-table walks for table lookups. Includes safe inline caches for
luaH_getstr(16,384 entries) andlua_rawgeti(8192 entries) with content validation that survives GC rehash and allocator address reuse. - Optimized Stack Writes:
lua_pushnumberwritesTValuesdirectly to the Lua stack, skipping the overhead of a full API call. - Stability Fixes: All VM hooks now bail out during a
lua_Stateswap, fixing the logout and UI reload crash found in version 3.8.0. - Fast Paths: 20 out of 27 Lua C functions now have fast paths, including
string.rep,math.random,math.sqrt, and 7 commonstring.formatpatterns.
6 CPU-Side Modules
- Off-Screen Animation Throttling: Implements a 3-tier distance-based update rate (full rate / every 4th frame / every 16th frame) to reduce M2 bone-math overhead for models outside the view frustum.
- SSE2 Math Library: Accelerates 4×4 matrix multiply, quaternion normalize, frustum AABB-vs-4-planes cull, BGRA↔ARGB batch swap (via SSSE3), and premultiplied alpha batching.
- Combat Text Ring-Buffer Batching: Uses a 256-entry accumulator flushed once per frame instead of making one heavy D3D call per floating text.
- UI Layout Dirty-Flag Cache: A 4096-slot frame-pointer-keyed cache with generation-based invalidation. It skips deep tree traversal for UI frames that haven't changed since the last layout pass.
- Network Heartbeat Filter: Suppresses redundant
CMSG_PINGandCMSG_TIME_SYNC_RESPpackets when the client has recently transmitted real data. - Invariant Lua Script Cache: Caches
UnitHealth,UnitPower, andUnitClassoutcomes within a single frame, avoiding repeated, expensiveLua → C → Luaround-trips.
Memory & Async
- Slab Allocator: A 64-byte aligned, 8-tier slab allocator (64B–8192B, backed by
VirtualAlloc) designed for cache-line-aligned hot structures. - GUID Hash-Table: A 16,384-entry GUID→object FNV-1a hash-table featuring lock-free reads.
- Worker Pool: A 2-thread SPMC (Single Producer Multiple Consumer) worker pool with 2048 slots for fire-and-forget async dispatch. This handles particle SSE2 math, ADT terrain prefetch, and WoW color-code stripping out-of-band.
Infrastructure
infra_patch(50 APIs): Manages object pools, deduplication, frame-time smoothing, and adaptive cache TTL.hot_patch(20 Features): Adds a datastore lookup cache, tooltip early-exit, cleanup prefetch, and event deduplication.- Enhanced CrashDumper: Features a 64-slot feature registry and a 256-entry hook call trace, allowing you to see exactly which optimization module was running at the time of a crash.
20 New Caches
Includes 20 lookup/transform caches totaling ~4MB of pre-warmed acceleration data:
- Spell history
- M2 model init
- FMOD audio config
- FrameScript opcode
- DBC record index
- SSE2 event name hash
- String interning L2
- Combat log bloom dedup
- Render state batch
- Texture decode prefetch
- BZ2 SSE2
- Vertex transform SSE2
- FMOD IT codec
- Tooltip generator prefetch
- FrameScript dispatch
- M2 model prepare
- Spell batch
- Regex extended
- Audio mixer
- (Unified pre-warmed framework)
Diagnostics That Actually Help
- Freeze Watchdog: Detects when the main thread stops responding for 10+ seconds and automatically dumps a list of active features.
- Priority Watchdog: Prevents Windows/WoW from silently downgrading the process priority.
- Note: Both watchdogs are rate-limited to avoid log spam.
Verified Hook Addresses
Populated 7 previously-unresolved hook targets discovered via binary analysis:
- GUID→object resolver
- GUID entry creation
UnitHealthUnitPowerUnitClassCM2Model::AdvanceTime- Combat text event dispatch
What Testers Should Focus On
- Dalaran with ElvUI + WeakAuras + DBM: The new Lua VM engine and safe inline caches should produce measurably smoother, more consistent frametimes in heavy UI environments.
- Raid/Dungeon Combat: Focus on heavy AoE pulls. Combat text batching and GC micro-stepping should eliminate the classic 0.5–1.0s micro-stutters.
- Logout → Character Screen → Login Loops: This was the primary crash vector in v3.8.0. The new
IsReloadingguards on all VM hooks are designed to completely fix this issue. - Zone Transitions: Test loading times via
/hearthstoneor portals. Smoothness should be drastically improved by the heap compactor deferral and cache pre-warming. - Long Sessions (3+ Hours): The VirtualAlloc (
VA) fragmentation monitor should now prevent the late-session Out-Of-Memory (OOM) crashes that frequently plague HD client setups.
Installation
Download wow_optimize.dll and version.dll from the release assets.
Copy both files to your WoW 3.3.5a folder. Launch normally — the version.dll
proxy loads the optimizer automatically.
Requires the !LuaBoost addon for
GC mode synchronization, loading state detection, and DLL/addon bridging.
Built with Visual Studio, tested on Windows 10/11, Wine 9.x, and macOS via
WoWSilicon/Rosetta 2. 32-bit x86, static MSVC runtime.