pg_textsearch v1.1.0
The 1.1.0 release is organized around three themes: write-path performance for update-heavy workloads, operational readiness, and new indexing features. Lots of good stuff in this one!
Write-path performance
-
Concurrent inserts (#307). Finer-grained memtable locking lets multiple backends write to the same BM25 index with better parallel throughput.
Result on the MSMARCO 8.8M-passage benchmark (8 vCPU, pgbench-driven concurrent INSERT):
Clients v1.0.0 TPS v1.1.0 TPS ParadeDB TPS 1 2,546 2,450 2,590 2 4,314 4,605 4,964 4 4,196 7,427 8,255 8 4,254 11,593 12,620 v1.0.0 plateaued near 4,300 TPS regardless of client count; v1.1.0 scales to ~11,600 TPS at 8 clients, roughly at parity with ParadeDB.
-
Fast VACUUM via per-segment alive bitsets (#317). VACUUM now flips bits in a pre-allocated alive bitset (1 bit per doc) instead of rebuilding segments. Cost is
O(dead_docs)rather thanO(all_docs); segments with no live docs are dropped entirely. Dead docs are filtered during scoring and physically removed at the next merge. Segment format is bumped to v5 — backward-compatible with v3 and v4, upgraded in place at the next VACUUM or merge. -
Subtransaction safety (#316). Indexes created inside a
SAVEPOINTthat later rolls back are now cleaned up properly. -
Parallel build race fix (#312). After a parallel
CREATE INDEX, shared state is now created in the registry before the building backend's memtable could become invisible to concurrent scans.
Operational readiness
- Memory limit GUC (#293). The new
pg_textsearch.memory_limit(default 2 GB) caps shared memory used by memtables. Three internal thresholds are derived automatically: a per-index soft limit triggers a spill, a global soft limit evicts the largest memtable, and a hard limit rejects further inserts before the OOM killer can take down the server. - Memory usage visibility (#293).
SELECT * FROM bm25_memory_usage()exposes total DSA bytes, estimated bytes, the configured limit, and percent utilization. - Schema-qualified SQL objects (#300). All extension functions, types, and operators are now resolved via
@extschema@, closing search-path-based shadowing attacks. - Coverity-flagged defects (#331, #329). Fixed dead assignments, double-read atomics, and a V3-segment VACUUM offset bug that could corrupt the previous segment's header.
New indexing features
-
Native
text[]support (#309). Indextext[](andvarchar[],bpchar[]) columns directly. Array elements are concatenated with spaces before tokenization, producing scores identical to an equivalent scalartextindex.CREATE TABLE posts (id serial, tags text[]); CREATE INDEX ON posts USING bm25 (tags) WITH (text_config='english'); SELECT * FROM posts ORDER BY tags <@> 'database' LIMIT 10;
-
Expression and partial indexes (#308). Index arbitrary IMMUTABLE expressions, and add
WHEREpredicates to scope the index to a subset of rows. Covers JSONB extraction, multi-column concatenation, and per-language partial indexes.-- JSONB extraction CREATE INDEX ON events USING bm25 ((data->>'description')) WITH (text_config='english'); -- One partial index per language CREATE INDEX docs_en ON docs USING bm25 (content) WITH (text_config='english') WHERE lang = 'en'; CREATE INDEX docs_de ON docs USING bm25 (content) WITH (text_config='german') WHERE lang = 'de';
Upgrading from 1.0.0
ALTER EXTENSION pg_textsearch UPDATE is sufficient — no REINDEX needed. As always, install the new binary and restart Postgres before running the SQL upgrade.