github ozontech/pg_doorman v3.3.2
v3.3.2 — The one with PAUSE/RESUME

5 hours ago

Highlights: This release adds PAUSE, RESUME, and RECONNECT admin commands for zero-downtime PostgreSQL maintenance. Backend health monitoring now detects dead connections while a client holds an idle transaction. Nine bug fixes address connection lifetime defaults, idle timeout behavior, and retain cycle fairness.

Features

  • Add PAUSE, RESUME, RECONNECT admin commands for runtime pool control. PAUSE [db] blocks new backend connections (active transactions finish). RESUME [db] lifts the pause. RECONNECT [db] rotates all connections — idle ones close immediately, active ones close on return. Use case: PAUSE → upgrade PostgreSQL → RESUME. (#141 by @vadv)

  • Add stale backend detection during client idle-in-transaction. Previously, if the backend died (failover, OOM kill, network partition) while a client held a transaction open, pg_doorman only discovered this when the client sent the next query. Until then the dead connection occupied a pool slot, blocking new clients from being served. Now pg_doorman actively probes the backend during idle-in-transaction and detects failures within ~100ms, freeing the pool slot immediately. (#144 by @vadv)

  • Add pool_size column to SHOW POOLS / SHOW POOLS_EXTENDED and new pg_doorman_pool_size Prometheus gauge. Compare active connections against configured capacity without checking the config file. (#149 by @Fingolfin-Anekane)

  • Add auth_query.min_pool_size for dynamic user pools in passthrough mode. Keeps a minimum number of backend connections per user, prewarmed on pool creation and replenished by the retain cycle. Pools with min_pool_size > 0 are never garbage-collected. Default 0. (#147 by @Fingolfin-Anekane)

Changes

  • Rename auth_query.pool_sizeauth_query.workers (executor connections) and auth_query.default_pool_sizeauth_query.pool_size (data connections). Migration: rename pool_size to workers and default_pool_size to pool_size in your auth_query config. (#148 by @Fingolfin-Anekane)

  • Change idle_timeout default from 300,000,000ms (~83 hours) to 600,000ms (10 minutes). The old default effectively disabled idle cleanup. (#139)

  • Change server_lifetime default from 5 minutes to 20 minutes. The old value was shorter than idle_timeout, preventing idle cleanup from ever triggering. (#139)

Fixes

  • Fix session mode connections destroyed after ordinary SQL errors. A failed query (e.g. SELECT 1/0) marked the backend as bad, destroying a working connection at session end. (#152 by @vadv)

  • Fix pool-level server_lifetime and idle_timeout overrides silently ignored — global values were always used. (#139 by @vadv)

  • Fix idle_timeout = 0 closing connections after ~1ms instead of disabling idle cleanup. Now matches PgBouncer's server_idle_timeout = 0 semantics. (#139)

  • Fix idle timeout having no jitter — all connections expired simultaneously after a traffic burst, causing mass closures. Now applies ±20% per-connection jitter. (#139)

  • Fix retain_connections_max draining a random pool when quota was exhausted. A zero remainder was misinterpreted as "unlimited", closing all idle connections in one cycle. (#139)

  • Fix unfair retain quota distribution — HashMap iteration gave the same pool priority every cycle, starving others. Fixed by shuffling iteration order. (#139)

  • Fix retain and replenish using different pool snapshots when config reload happened between phases. (#139)

Don't miss a new pg_doorman release

NewReleases is sending notifications on new releases.