github vibhorkum/pg_background v2.0.1
pg_background v2.0.1 — cancel cluster-crash fix

8 hours ago

Cancel cluster-crash fix

Fixes a server crash in worker cancellation.

pg_background_cancel(pid, cookie, grace_ms) (and the batch / by-label variants) previously escalated to kill(pid, SIGKILL) when a worker had not stopped within the grace window. PostgreSQL's postmaster treats any child that dies from an uncaught signal as a crash and responds with a cluster-wide restart (terminating every backend and reinitializing shared memory), so a single cancel could take down the whole server.

This was benign on fast machines — the worker almost always exited cooperatively within grace_ms, so the SIGKILL never fired — but caused a deterministic crash on slow or loaded hosts, observed on the Debian/pgdg PostgreSQL 18 build farm and reproduced locally under Valgrind, where a worker still in PostgreSQL's bgworker startup path when the grace timer expired was killed before it ever began executing. Because SIGKILL is uncatchable the worker logged nothing, which had previously been misdiagnosed as an upstream PG SIGSEGV.

What changed

  • Removed the SIGKILL escalation. Cancellation is now cooperative only: SIGTERM plus, for grace_ms > 0, a bounded wait so the caller can observe whether the worker stopped. This matches the documented contract — cancel requests termination; it does not guarantee an immediate stop.
  • An unresponsive, CPU-bound worker that never reaches an interrupt check is not force-stopped; bound such work with statement_timeout or the pg_background.worker_timeout GUC.
  • Docs corrected (README "Known Limitations §10", ARCHITECTURE.md) and the cancel SQL COMMENT / migration notes updated.

Upgrade

No SQL or API changes — the extension version stays 2.0. Upgrade by rebuilding/replacing pg_background.so (make install and restart sessions/workers); no ALTER EXTENSION step is required.

Validation

Standard installcheck passes on PG 18 with no expected-output change; the Valgrind run that previously crashed at the cancel test now completes cleanly (no abnormal worker exits — cancelled workers exit with code 1, never by signal).

Don't miss a new pg_background release

NewReleases is sending notifications on new releases.