github lightonai/next-plaid v1.4.0
v1.4.0 — grep-parity regex

4 hours ago

Highlights

This release brings colgrep's -e mode to functional parity with grep / grep -P. Stress-tested across ~75 regex features against grep on a 291-file repo: 72 exact-parity, 3 minor numeric diffs, 0 functional failures.

New regex capabilities

  • Lookahead foo(?=bar) / negative foo(?!bar)
  • Lookbehind (?<=foo)bar / negative (?<!foo)bar
  • Backreferences (\w+)\s*=\s*\1

Backed by fancy-regex, which delegates to the regex crate for non-fancy patterns (so previously-working patterns keep their speed) and only falls back to its NFA when a feature regex cannot express appears. ReDoS-protected by fancy-regex's default backtrack limit (1M); is_match failures are treated as "no match" so a single pathological line cannot abort a scan.

New flag

  • -s / --case-sensitive — opt into case-sensitive matching. Default stays case-insensitive (historical colgrep behaviour); passing -s drops the (?i) inline flag end-to-end (SQLite filter + per-line matcher + literal fallback). The semantic-side hybrid filter is always case-insensitive — ColBERT embeddings handle case fuzzily and we want broad recall.

Behaviour changes

  • -k is now a document cap in regex mode. colgrep -e foo returns every matching line in every matching document (grep parity). colgrep -e foo -k N returns top-N scored documents and every match inside those. Semantic-only mode is unchanged — default -k 15, path:start-end per chunk.
  • Per-line anchors. ^ and $ now anchor to line boundaries inside chunk text (via (?m)), matching grep. Previously ^use only matched chunks whose first byte was use .

Bug fixes

  • \{, \}, \+, \? are now literal in ERE mode. escape_literal_braces no longer mangles \{ into \[{]; bre_to_ere no longer strips the backslash from \+/\?, which had silently turned ERE-literal escapes into one-or-more quantifiers.

Other

  • PatternMatcher refactored from a bare enum to a { kind: PatternKind, case_sensitive: bool } struct so the line-match logic has a single source of truth.
  • regex crate is kept for the BRE→ERE preprocessor (pure text manipulation); the matching engine is fancy-regex.

PRs included

  • #105 — Emit every matching line in regex mode
  • #106 — Per-line anchors, -k semantics, escape-bug fixes, fancy-regex, -s

Compatibility

No CLI surface removed. The only behaviour change is -k no longer truncating printed lines when -e is used without an explicit -k. Patterns that were silently empty before (lookaround, backref, \+/\?/\{) now return matches.

Don't miss a new next-plaid release

NewReleases is sending notifications on new releases.