jundot/omlx v0.1.14.post4 on GitHub

v0.1.14.post4 (Hotfix)

Interrupt prefill on abort to prevent wasted GPU compute (#62)
- When a client disconnected during a long prefill (e.g. 17k tokens, ~47s), the abort was enqueued but the prefill loop had no mechanism to check for it, continuing to process all remaining tokens before the abort could execute
- Added abort check callback at every 1024-token chunk boundary during prefill, reducing wasted time from ~32 seconds to ~1.6 seconds
Send SSE keepalive immediately and reduce interval from 30s to 10s (#62)
- The first SSE keepalive was sent 30 seconds after stream open, but clients like openclaw have a ~15s read timeout, causing premature disconnect during long prefills
- Now sends an initial keepalive comment immediately when the SSE stream opens, keeping the connection alive through the entire prefill phase
Refresh model list in admin dashboard after global settings change
- After changing cache settings, models were correctly unloaded on the server but the UI still showed them as "Loaded" until manual refresh