github jundot/omlx v0.2.3.post4

latest releases: v0.3.8.dev1, v0.3.7, v0.3.7rc2...
one month ago

Hotfix: Fix crash when running multiple models simultaneously

Fixed a bug where the server process terminates when two or more models receive requests at the same time.

Symptom: Server crashes when multiple models are used concurrently (e.g., VLM as interface model + LLM for chat in Open WebUI)

Cause: Each model engine ran GPU operations on a separate thread, causing Metal command buffer races on Apple Silicon

Fix: All model GPU operations now run on a single shared thread. No impact on single-model performance.

Closes #85 / Ref #80

Don't miss a new omlx release

NewReleases is sending notifications on new releases.