v0.1.14.post3 (Hotfix)
Bug fixes
- Fix model memory not being freed after engine unload (#62)
- When a model was evicted (via LRU, TTL expiration, or memory enforcer), the engine and scheduler retained direct Python references to the model weights, preventing garbage collection from reclaiming GPU memory
- This caused
mx.get_active_memory()to report stale high values, blocking subsequent model loads with "projected memory would exceed limit" errors