Highlights
This release brings colgrep's -e mode to functional parity with grep / grep -P. Stress-tested across ~75 regex features against grep on a 291-file repo: 72 exact-parity, 3 minor numeric diffs, 0 functional failures.
New regex capabilities
- Lookahead
foo(?=bar)/ negativefoo(?!bar) - Lookbehind
(?<=foo)bar/ negative(?<!foo)bar - Backreferences
(\w+)\s*=\s*\1
Backed by fancy-regex, which delegates to the regex crate for non-fancy patterns (so previously-working patterns keep their speed) and only falls back to its NFA when a feature regex cannot express appears. ReDoS-protected by fancy-regex's default backtrack limit (1M); is_match failures are treated as "no match" so a single pathological line cannot abort a scan.
New flag
-s/--case-sensitive— opt into case-sensitive matching. Default stays case-insensitive (historical colgrep behaviour); passing-sdrops the(?i)inline flag end-to-end (SQLite filter + per-line matcher + literal fallback). The semantic-side hybrid filter is always case-insensitive — ColBERT embeddings handle case fuzzily and we want broad recall.
Behaviour changes
-kis now a document cap in regex mode.colgrep -e fooreturns every matching line in every matching document (grep parity).colgrep -e foo -k Nreturns top-N scored documents and every match inside those. Semantic-only mode is unchanged — default-k 15,path:start-endper chunk.- Per-line anchors.
^and$now anchor to line boundaries inside chunk text (via(?m)), matching grep. Previously^useonly matched chunks whose first byte wasuse.
Bug fixes
\{,\},\+,\?are now literal in ERE mode.escape_literal_bracesno longer mangles\{into\[{];bre_to_ereno longer strips the backslash from\+/\?, which had silently turned ERE-literal escapes into one-or-more quantifiers.
Other
PatternMatcherrefactored from a bare enum to a{ kind: PatternKind, case_sensitive: bool }struct so the line-match logic has a single source of truth.regexcrate is kept for the BRE→ERE preprocessor (pure text manipulation); the matching engine isfancy-regex.
PRs included
- #105 — Emit every matching line in regex mode
- #106 — Per-line anchors,
-ksemantics, escape-bug fixes, fancy-regex,-s
Compatibility
No CLI surface removed. The only behaviour change is -k no longer truncating printed lines when -e is used without an explicit -k. Patterns that were silently empty before (lookaround, backref, \+/\?/\{) now return matches.