Hotfix
Bug fixes
- Fix VLM concurrent request GPU race condition causing TransferEncodingError and server crash (#80)
- Remove
mx.clear_cache()from event loop thread to prevent Metal GPU contention with_mlx_executorduring concurrent VLM requests - Always synchronize
generation_streamon request completion regardless of cache setting (previously skipped when oMLX cache was disabled) - Add
clear_pending_embeddings()to normal completion path for consistency with abort path
- Remove