jundot/omlx v0.2.3.post3 on GitHub

Hotfix

Fix VLM concurrent request GPU race condition causing TransferEncodingError and server crash (#80)
- Remove mx.clear_cache() from event loop thread to prevent Metal GPU contention with _mlx_executor during concurrent VLM requests
- Always synchronize generation_stream on request completion regardless of cache setting (previously skipped when oMLX cache was disabled)
- Add clear_pending_embeddings() to normal completion path for consistency with abort path