github suprepupre/wow-optimize v3.9.0
wow_optimize v3.9.0 — Lua VM Engine + 6 CPU-Side Modules

latest release: v3.10.0
4 days ago

WoW Optimize Release Notes

This release ships a direct-threaded Lua bytecode interpreter that replaces World of Warcraft's switch-based opcode dispatch, plus six new optimization modules targeting rendering, math, logic, memory, and async workloads.

What's New

Lua VM Overhaul

  • Direct-Threaded Interpreter: Features an 8192-site × 4-way inline cache that eliminates repeated hash-table walks for table lookups. Includes safe inline caches for luaH_getstr (16,384 entries) and lua_rawgeti (8192 entries) with content validation that survives GC rehash and allocator address reuse.
  • Optimized Stack Writes: lua_pushnumber writes TValues directly to the Lua stack, skipping the overhead of a full API call.
  • Stability Fixes: All VM hooks now bail out during a lua_State swap, fixing the logout and UI reload crash found in version 3.8.0.
  • Fast Paths: 20 out of 27 Lua C functions now have fast paths, including string.rep, math.random, math.sqrt, and 7 common string.format patterns.

6 CPU-Side Modules

  • Off-Screen Animation Throttling: Implements a 3-tier distance-based update rate (full rate / every 4th frame / every 16th frame) to reduce M2 bone-math overhead for models outside the view frustum.
  • SSE2 Math Library: Accelerates 4×4 matrix multiply, quaternion normalize, frustum AABB-vs-4-planes cull, BGRA↔ARGB batch swap (via SSSE3), and premultiplied alpha batching.
  • Combat Text Ring-Buffer Batching: Uses a 256-entry accumulator flushed once per frame instead of making one heavy D3D call per floating text.
  • UI Layout Dirty-Flag Cache: A 4096-slot frame-pointer-keyed cache with generation-based invalidation. It skips deep tree traversal for UI frames that haven't changed since the last layout pass.
  • Network Heartbeat Filter: Suppresses redundant CMSG_PING and CMSG_TIME_SYNC_RESP packets when the client has recently transmitted real data.
  • Invariant Lua Script Cache: Caches UnitHealth, UnitPower, and UnitClass outcomes within a single frame, avoiding repeated, expensive Lua → C → Lua round-trips.

Memory & Async

  • Slab Allocator: A 64-byte aligned, 8-tier slab allocator (64B–8192B, backed by VirtualAlloc) designed for cache-line-aligned hot structures.
  • GUID Hash-Table: A 16,384-entry GUID→object FNV-1a hash-table featuring lock-free reads.
  • Worker Pool: A 2-thread SPMC (Single Producer Multiple Consumer) worker pool with 2048 slots for fire-and-forget async dispatch. This handles particle SSE2 math, ADT terrain prefetch, and WoW color-code stripping out-of-band.

Infrastructure

  • infra_patch (50 APIs): Manages object pools, deduplication, frame-time smoothing, and adaptive cache TTL.
  • hot_patch (20 Features): Adds a datastore lookup cache, tooltip early-exit, cleanup prefetch, and event deduplication.
  • Enhanced CrashDumper: Features a 64-slot feature registry and a 256-entry hook call trace, allowing you to see exactly which optimization module was running at the time of a crash.

20 New Caches

Includes 20 lookup/transform caches totaling ~4MB of pre-warmed acceleration data:

  1. Spell history
  2. M2 model init
  3. FMOD audio config
  4. FrameScript opcode
  5. DBC record index
  6. SSE2 event name hash
  7. String interning L2
  8. Combat log bloom dedup
  9. Render state batch
  10. Texture decode prefetch
  11. BZ2 SSE2
  12. Vertex transform SSE2
  13. FMOD IT codec
  14. Tooltip generator prefetch
  15. FrameScript dispatch
  16. M2 model prepare
  17. Spell batch
  18. Regex extended
  19. Audio mixer
  20. (Unified pre-warmed framework)

Diagnostics That Actually Help

  • Freeze Watchdog: Detects when the main thread stops responding for 10+ seconds and automatically dumps a list of active features.
  • Priority Watchdog: Prevents Windows/WoW from silently downgrading the process priority.
  • Note: Both watchdogs are rate-limited to avoid log spam.

Verified Hook Addresses

Populated 7 previously-unresolved hook targets discovered via binary analysis:

  • GUID→object resolver
  • GUID entry creation
  • UnitHealth
  • UnitPower
  • UnitClass
  • CM2Model::AdvanceTime
  • Combat text event dispatch

What Testers Should Focus On

  • Dalaran with ElvUI + WeakAuras + DBM: The new Lua VM engine and safe inline caches should produce measurably smoother, more consistent frametimes in heavy UI environments.
  • Raid/Dungeon Combat: Focus on heavy AoE pulls. Combat text batching and GC micro-stepping should eliminate the classic 0.5–1.0s micro-stutters.
  • Logout → Character Screen → Login Loops: This was the primary crash vector in v3.8.0. The new IsReloading guards on all VM hooks are designed to completely fix this issue.
  • Zone Transitions: Test loading times via /hearthstone or portals. Smoothness should be drastically improved by the heap compactor deferral and cache pre-warming.
  • Long Sessions (3+ Hours): The VirtualAlloc (VA) fragmentation monitor should now prevent the late-session Out-Of-Memory (OOM) crashes that frequently plague HD client setups.

Installation

Download wow_optimize.dll and version.dll from the release assets.
Copy both files to your WoW 3.3.5a folder. Launch normally — the version.dll
proxy loads the optimizer automatically.

Requires the !LuaBoost addon for
GC mode synchronization, loading state detection, and DLL/addon bridging.

Built with Visual Studio, tested on Windows 10/11, Wine 9.x, and macOS via
WoWSilicon/Rosetta 2. 32-bit x86, static MSVC runtime.

Don't miss a new wow-optimize release

NewReleases is sending notifications on new releases.