Cancel-path cluster-crash fixes
A patch release rolling up two crash fixes on the worker cancellation path. (Supersedes the briefly-published v2.0.1, whose tag was retired; both fixes below are included here.)
1. No more SIGKILL escalation on cancel (was v2.0.1)
pg_background_cancel(pid, cookie, grace_ms) (and the batch / by-label variants) previously escalated to kill(pid, SIGKILL) when a worker had not stopped within the grace window. PostgreSQL's postmaster treats any child that dies from an uncaught signal as a crash and responds with a cluster-wide restart (terminating every backend and reinitializing shared memory), so a single cancel could take down the whole server.
This was benign on fast machines — the worker almost always exited cooperatively within grace_ms — but caused a deterministic crash on slow or loaded hosts (observed on the Debian/pgdg PG 18 build farm, reproduced under Valgrind), where a worker still in bgworker startup when the grace timer expired was killed before it began executing. Because SIGKILL is uncatchable the worker logged nothing, which had been misdiagnosed as an upstream PG SIGSEGV.
- Removed the
SIGKILLescalation. Cancellation is now cooperative only:SIGTERMplus, forgrace_ms > 0, a bounded wait so the caller can observe whether the worker stopped. This matches the documented contract — cancel requests termination; it does not guarantee an immediate stop. - An unresponsive, CPU-bound worker that never reaches an interrupt check is not force-stopped; bound such work with
statement_timeoutor thepg_background.worker_timeoutGUC.
2. No ResourceOwnerDelete(NULL) on pre-start cancel (new)
When a cancel was requested before the worker began executing, the worker's early-exit path called ResourceOwnerDelete(CurrentResourceOwner). By that point the GUC-restore transaction has already committed, so CurrentResourceOwner is NULL; deleting a NULL owner trips Assert(owner != CurrentResourceOwner) on assert-enabled builds, aborting the worker (SIGABRT) and triggering a cluster restart. Observed as a worker crash on the PostgreSQL 19 beta PGDG assert build; the regular (non-assert) installcheck matrix did not surface it.
- The early-cancel path now simply
proc_exit(0)s and lets backend shutdown clean up, matching the guarded cleanup already used on the normal exit path. - Added a best-effort regression test (grace=0 cancel immediately after launch) that exercises this branch under the assert/sanitizer CI lanes.
Upgrade
No SQL or API changes — the extension version stays 2.0. Upgrade by rebuilding/replacing pg_background.so (make install and restart sessions/workers); no ALTER EXTENSION step is required.
Validation
All Docker test suites pass on the working tree: installcheck (PG 14–19), assert-enabled build (PG 18), ASan+UBSan (PG 18), relocatable (PG 19), and the 1.8→…→2.0 upgrade path (PG 18). The new pre-start-cancel test passes with no abnormal worker exits.