What's Changed
- Numerous performance improvements and stability fixes addressing workloads at scale
- On MS-MARCO v2 (133M documents, 47GB base data), query throughput is now 3.5X higher than the leading Postgres-based BM25 extension, while indexing performance is also greatly improved, with much smaller RAM consumption
- See overview at https://timescale.github.io/pg_textsearch/benchmarks/comparison.html
- pg_textsearch now requires an entry in shared_load_libraries
- This addresses a variety of stability issues arising from mismatched library versions during upgrade
Gory Details
- bench: add MS MARCO v2 dataset and weighted-average latency metric by @tjgreen42 in #226
- feat: widen segment offsets from uint32 to uint64 (V4 format) by @tjgreen42 in #220
- chore: improve code coverage toward 90% by @tjgreen42 in #222
- fix: use min fieldnorm for BMW skip entries in parallel build by @tjgreen42 in #230
- bench: add VACUUM step to ParadeDB benchmarks for segment compaction by @tjgreen42 in #233
- feat: version the shared library filename by @tjgreen42 in #232
- feat: reduce build chatter for partitioned tables by @tjgreen42 in #214
- fix: resolve test failures on PG17 and /tmp environments by @tjgreen42 in #234
- Revert versioned shared library filename by @tjgreen42 in #238
- fix: TOCTOU race in parallel build loses documents by @tjgreen42 in #240
- fix: run BM25 validation in benchmark CI workflow by @tjgreen42 in #239
- feat: require shared_preload_libraries for pg_textsearch by @tjgreen42 in #235
- feat: detect stale binary after upgrade via library version check by @tjgreen42 in #241
- feat: rewrite index build with arena allocator and parallel page pool by @tjgreen42 in #231
- fix: add coverage gate to block PRs on coverage reduction by @tjgreen42 in #245
- feat: leader-only merge for parallel index build by @tjgreen42 in #244
- bench: add insert benchmarks; fix insert performance regression by @tjgreen42 in #242
- fix: resolve all compiler warnings in extension source by @tjgreen42 in #246
- fix: crash when creating BM25 index on temp table by @tjgreen42 in #248
- perf: use pointer indirection for BMW term state ordering by @tjgreen42 in #249
- perf: SIMD-accelerated bitpack decoding by @tjgreen42 in #250
- fix: security hardening for user-input-facing code paths by @tjgreen42 in #251
- perf: skip empty memtable during query scoring by @tjgreen42 in #252
- perf: stack-allocate decode buffers in tp_decompress_block by @tjgreen42 in #253
- Release v0.6.0 by @tjgreen42 in #254
Full Changelog: v0.5.1...v0.6.0