jundot/omlx v0.4.2.dev3 on GitHub

This development release adds native MarkItDown document processing and VLM-based PDF processing in oMLX, improves Gemma 4 tool-call stability, and hardens multimodal precision, cache, memory, and engine scheduling.

oMLX_0.4.2_MarkItDown_v2.mp4

Added native MarkItDown document processing and VLM-based PDF processing. Uploaded files can now be converted through MarkItDown, and PDFs can use either MarkItDown or VLM OCR from the selected processing engine.
Improved Gemma 4 tool-call stability. Multi-turn Gemma 4 MoE tool conversations now strip stray tool-call close markers before re-rendering conversation history. by @kreeger in #1665
Improved raw tool-call JSON recovery. Tool calls with raw tabs or newlines inside generated JSON string values are now recovered and returned as valid structured tool calls.
Improved multimodal oQ precision. Protected vision and audio tensors are preserved in float32 during oQ conversion to avoid FP16 overflow and multimodal quality loss. by @dodams258 in #1682
Improved engine eviction safety. Embedding and rerank engines are now leased while in use, preventing acquire-vs-use eviction races and resetting leaked activity counters on teardown. by @Cmerrill1713 in #1668
Improved cache and prefill backpressure. Hot-cache budget is shared across models, cache-heavy prefills wait while cache-store cleanup is full, and idle wakeups are guarded for partial engine cores.
Improved small-system memory behavior. Sub-24GB Apple Silicon systems now use the small-system reserve path, reducing over-reservation from tiered defaults.
Reduced idle CPU overhead. Loaded models now avoid unnecessary idle wakeups while remaining ready for requests.

New Contributors

@Cmerrill1713 made their first contribution in #1668
@kreeger made their first contribution in #1665
@sje397 made their first contribution in #1671
@dodams258 made their first contribution in #1682

jundot/omlx v0.4.2.dev3 0.4.2.dev3 on GitHub

jundot/omlx v0.4.2.dev3
0.4.2.dev3

on GitHub