v0.1.14.post2 (Hotfix)
Bug fixes
- Fix ghost request after client disconnect causing server to stop processing subsequent requests (#62)
- When a client disconnected mid-request, a race condition in the cleanup path left a ghost request in the scheduler's active batch, silently generating tokens into nowhere while occupying a running slot, blocking or delaying all subsequent requests until it hit max_tokens
- This bug existed since v0.1.6 but became more visible in recent versions due to increased abort frequency from inline memory checks and error propagation