jundot/omlx v0.2.3.post4 on GitHub

Hotfix: Fix crash when running multiple models simultaneously

Fixed a bug where the server process terminates when two or more models receive requests at the same time.

Symptom: Server crashes when multiple models are used concurrently (e.g., VLM as interface model + LLM for chat in Open WebUI)

Cause: Each model engine ran GPU operations on a separate thread, causing Metal command buffer races on Apple Silicon

Fix: All model GPU operations now run on a single shared thread. No impact on single-model performance.

Closes #85 / Ref #80