v0.2.22 Release Notes
Bug Fixes
- Fix GPU synchronization before
batch_generator.remove()in request abort path - Fix prefill performance regression from unnecessary per-chunk
_sync_and_clear_cache()calls (#396) - Fix images being stripped from Anthropic
tool_resultcontent for VLM models (#393) - Fix GPTQ axis mismatch — align dequantize-quantize grouping with
mx.quantize - Fix GPTQ
group_sizefallback crash on non-power-of-2 output dimensions - Fix accuracy benchmark forcing LM engine to avoid VLM empty responses
Improvements
- Support
x-api-keyheader for Anthropic SDK compatibility (#379) - oQ: MLP asymmetry for dense models — reduce
up_projbits while protectinggate_proj/down_proj - oQ: GPTQ performance and stability improvements, rename enhanced suffix to
e