github timescale/pg_textsearch v1.1.0

15 hours ago

pg_textsearch v1.1.0

The 1.1.0 release is organized around three themes: write-path performance for update-heavy workloads, operational readiness, and new indexing features. Lots of good stuff in this one!

Write-path performance

  • Concurrent inserts (#307). Finer-grained memtable locking lets multiple backends write to the same BM25 index with better parallel throughput.

    Result on the MSMARCO 8.8M-passage benchmark (8 vCPU, pgbench-driven concurrent INSERT):

    Clients v1.0.0 TPS v1.1.0 TPS ParadeDB TPS
    1 2,546 2,450 2,590
    2 4,314 4,605 4,964
    4 4,196 7,427 8,255
    8 4,254 11,593 12,620

    v1.0.0 plateaued near 4,300 TPS regardless of client count; v1.1.0 scales to ~11,600 TPS at 8 clients, roughly at parity with ParadeDB.

  • Fast VACUUM via per-segment alive bitsets (#317). VACUUM now flips bits in a pre-allocated alive bitset (1 bit per doc) instead of rebuilding segments. Cost is O(dead_docs) rather than O(all_docs); segments with no live docs are dropped entirely. Dead docs are filtered during scoring and physically removed at the next merge. Segment format is bumped to v5 — backward-compatible with v3 and v4, upgraded in place at the next VACUUM or merge.

  • Subtransaction safety (#316). Indexes created inside a SAVEPOINT that later rolls back are now cleaned up properly.

  • Parallel build race fix (#312). After a parallel CREATE INDEX, shared state is now created in the registry before the building backend's memtable could become invisible to concurrent scans.

Operational readiness

  • Memory limit GUC (#293). The new pg_textsearch.memory_limit (default 2 GB) caps shared memory used by memtables. Three internal thresholds are derived automatically: a per-index soft limit triggers a spill, a global soft limit evicts the largest memtable, and a hard limit rejects further inserts before the OOM killer can take down the server.
  • Memory usage visibility (#293). SELECT * FROM bm25_memory_usage() exposes total DSA bytes, estimated bytes, the configured limit, and percent utilization.
  • Schema-qualified SQL objects (#300). All extension functions, types, and operators are now resolved via @extschema@, closing search-path-based shadowing attacks.
  • Coverity-flagged defects (#331, #329). Fixed dead assignments, double-read atomics, and a V3-segment VACUUM offset bug that could corrupt the previous segment's header.

New indexing features

  • Native text[] support (#309). Index text[] (and varchar[], bpchar[]) columns directly. Array elements are concatenated with spaces before tokenization, producing scores identical to an equivalent scalar text index.

    CREATE TABLE posts (id serial, tags text[]);
    CREATE INDEX ON posts USING bm25 (tags) WITH (text_config='english');
    SELECT * FROM posts ORDER BY tags <@> 'database' LIMIT 10;
  • Expression and partial indexes (#308). Index arbitrary IMMUTABLE expressions, and add WHERE predicates to scope the index to a subset of rows. Covers JSONB extraction, multi-column concatenation, and per-language partial indexes.

    -- JSONB extraction
    CREATE INDEX ON events USING bm25 ((data->>'description'))
        WITH (text_config='english');
    
    -- One partial index per language
    CREATE INDEX docs_en ON docs USING bm25 (content)
        WITH (text_config='english') WHERE lang = 'en';
    CREATE INDEX docs_de ON docs USING bm25 (content)
        WITH (text_config='german')  WHERE lang = 'de';

Upgrading from 1.0.0

ALTER EXTENSION pg_textsearch UPDATE is sufficient — no REINDEX needed. As always, install the new binary and restart Postgres before running the SQL upgrade.

Don't miss a new pg_textsearch release

NewReleases is sending notifications on new releases.