github jundot/omlx v0.1.14.post4

latest releases: v0.3.0rc1, v0.2.24, v0.2.24.dev2...
27 days ago

v0.1.14.post4 (Hotfix)

Bug fixes

  • Interrupt prefill on abort to prevent wasted GPU compute (#62)

    • When a client disconnected during a long prefill (e.g. 17k tokens, ~47s), the abort was enqueued but the prefill loop had no mechanism to check for it, continuing to process all remaining tokens before the abort could execute
    • Added abort check callback at every 1024-token chunk boundary during prefill, reducing wasted time from ~32 seconds to ~1.6 seconds
  • Send SSE keepalive immediately and reduce interval from 30s to 10s (#62)

    • The first SSE keepalive was sent 30 seconds after stream open, but clients like openclaw have a ~15s read timeout, causing premature disconnect during long prefills
    • Now sends an initial keepalive comment immediately when the SSE stream opens, keeping the connection alive through the entire prefill phase
  • Refresh model list in admin dashboard after global settings change

    • After changing cache settings, models were correctly unloaded on the server but the UI still showed them as "Loaded" until manual refresh

Don't miss a new omlx release

NewReleases is sending notifications on new releases.