v0.1.14.post4 (Hotfix)
Bug fixes
-
Interrupt prefill on abort to prevent wasted GPU compute (#62)
- When a client disconnected during a long prefill (e.g. 17k tokens, ~47s), the abort was enqueued but the prefill loop had no mechanism to check for it, continuing to process all remaining tokens before the abort could execute
- Added abort check callback at every 1024-token chunk boundary during prefill, reducing wasted time from ~32 seconds to ~1.6 seconds
-
Send SSE keepalive immediately and reduce interval from 30s to 10s (#62)
- The first SSE keepalive was sent 30 seconds after stream open, but clients like openclaw have a ~15s read timeout, causing premature disconnect during long prefills
- Now sends an initial keepalive comment immediately when the SSE stream opens, keeping the connection alive through the entire prefill phase
-
Refresh model list in admin dashboard after global settings change
- After changing cache settings, models were correctly unloaded on the server but the UI still showed them as "Loaded" until manual refresh