v0.50.289 — TCP keepalive on accepted connections
1 PR by external contributor @happy5318. Closes #1580.
What's fixed
TCP keepalive on accepted connections to clean up dead CLOSE-WAIT sockets (#1581 by @happy5318; closes #1580)
Long-running Linux WebUI servers were accumulating CLOSE-WAIT zombie connections after clients crashed or lost their network without sending FIN. Without TCP keepalive enabled, threads blocked in recv() waiting for the next request had no way to detect the dead peer.
Fix: new Handler.setup() override in server.py that, on every accepted connection, sets:
SO_KEEPALIVE=1(master switch — enables TCP keepalive on this socket)TCP_NODELAY=1(disables Nagle for HTTP small-burst latency)TCP_KEEPIDLE=10/TCP_KEEPINTVL=5/TCP_KEEPCNT=3(kernel starts probing a connection idle for 10s, probes every 5s, drops after 3 failed probes — ~25s detection)
Healthy SSE streams' existing 30s app-level : keepalive\n\n heartbeat resets the kernel idle timer well below the 10s threshold, so probes never fire on healthy long-lived connections — only genuinely idle keep-alive sockets that have lost their peer get cleaned up.
Cross-platform: graceful no-op on macOS/Windows where TCP_KEEP* constants raise AttributeError. Linux production target gets the full benefit. (See #1583 for follow-up to extend macOS coverage.)
Tests
4094 → 4094 passing — no new tests; kernel-level networking change is impractical to test reliably in unit suite without a multi-process integration fixture.
Pre-release verification
- Independent reviewer (nesquena) APPROVED end-to-end.
- Pre-release Opus advisor: SHIP AS-IS — no MUST-FIX. All verification questions cleared.
- Full test suite: 4094 passed, 0 regressions.
- Live verification post-deploy:
ss -tnoeon production server showstimer:(keepalive,...)on accepted sockets, confirmingSO_KEEPALIVE=1is active on the server-side connection.
Maintainer in-stage actions
- PR rebase (REBASE-DEFAULT rule): PR base was 111 commits behind
origin/master(forked at6c3ff3ff, pre-v0.50.275). Rebased onto current master (v0.50.288). Clean, no conflicts.
Known follow-ups (filed as #1583)
QuietHTTPServer.server_bind()block contains harmless dead code (TCP_KEEP*withoutSO_KEEPALIVEon listening socket = no-op; redundantSO_REUSEADDRalready set by parent class).- macOS gets
TCP_NODELAYonly —TCP_KEEPIDLEAttributeError aborts the entiretryblock beforeSO_KEEPALIVE=1is reached. Linux production target unaffected.
Both deferred to a small cleanup PR.
Thanks @happy5318 for the diagnosis and fix!
Full changelog: https://github.com/nesquena/hermes-webui/blob/master/CHANGELOG.md