github jundot/omlx v0.3.10

3 hours ago

This is the 0.3.10 release, a follow-up to 0.3.9 focused on stability and post-release bug fixes. For the new-feature lineup, see the 0.3.9 release notes. Thanks to everyone who filed issues and sent fixes since 0.3.9 shipped. If you hit a bug, please open an issue.

Bug Fixes

  • OOM under sustained load: OMLX_MAX_PROCESS_MEMORY wasn't actually being enforced on batched / VLM engines, and finished requests piled up KV caches in an unbounded SSD-write queue. The two together could push memory past the cap and get the server killed. Both are fixed, and inflight load now scales with memory pressure (#1383).
  • OpenClaw / Codex getting empty replies: tag-free output from non-reasoning Qwen / Llama models was misclassified as thinking and dropped. It's now treated as content (#1348).
  • Native MTP crash on Qwen3.6 / Qwopus3.6 derivatives: MTP-quantized variants crashed with speculative_call() got unexpected keyword argument 'n_confirmed' after a dflash hot-swap. MTP patches now self-heal on every engine start and the dflash hook stays wrapped for the session (#1388).
  • Late aborts during engine teardown: cancelling right as the engine unloaded printed a traceback and could stick a request slot. Now absorbed cleanly (#1389, thanks @glasses666).
  • DFlash dropping images from multimodal requests: content was flattened before the image check, so image + text requests went down the text-only path. Multimodal content is now detected first and routed to the VLM fallback (#1344, thanks @ivaniguarans).
  • omlx launch breaking Codex on the macOS DMG: the bundled-Python env vars (PYTHONHOME / PYTHONPATH) leaked into the launched agent and confused its venv. Env scrub now runs for every launch target, not just claude (#1350).
  • oQ-quantized VLMs loading as text-only: processor_config.json wasn't being copied to the output, so the artifact lost its vision capability. Now copied through (#1386, thanks @a4501150).
  • Tool-calling fixes from @Mearman: tools: [] was treated as tools: None, ignoring clients that explicitly disable tools. And the thinking-model tool-call extractor dropped real tool calls when the model added a note after </think>, because the filter was text-shape based instead of name-matching (#1392, #1393).
  • Anthropic streaming tool-call index off: when a streaming response had thinking followed by a tool call, the tool block's index was wrong and broke client-side assembly. Indices are now sequential (#1356, thanks @lvsijian8).
  • HF download cancel not stopping Xet repos: cancel was a no-op on Xet-backed downloads. Xet is now disabled in the downloader so cancel works again.
  • ModelScope recommended cards missing params / size: the recommended row showed those fields for HF cards but not ModelScope, so they couldn't be compared. Both fields are now fetched for ModelScope too (#1351, thanks @popfido).
  • Dashboard charts filling with empty points after idle: idle dashboards kept appending null points to the timeline. Series now expire after an idle TTL (#1349, thanks @imi4u36d).
  • Browse Models name column too narrow: long names were chopped with no way to see the rest. The column is now wider, with a hover tooltip for the full name (#1369).

New Contributors

Thank you to everyone making their first contribution in 0.3.10:

@Mearman, @glasses666, @lvsijian8, @popfido, @imi4u36d.

Don't miss a new omlx release

NewReleases is sending notifications on new releases.