lightonai/next-plaid v1.2.0 on GitHub

Install colgrep 1.2.0

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/lightonai/next-plaid/releases/download/v1.2.0/colgrep-installer.sh | sh

Install prebuilt binaries via powershell script

powershell -ExecutionPolicy Bypass -c "irm https://github.com/lightonai/next-plaid/releases/download/v1.2.0/colgrep-installer.ps1 | iex"

Download colgrep 1.2.0

File	Platform	Checksum
colgrep-aarch64-apple-darwin.tar.xz	Apple Silicon macOS	checksum
colgrep-x86_64-apple-darwin.tar.xz	Intel macOS	checksum
colgrep-x86_64-pc-windows-msvc.zip	x64 Windows	checksum
colgrep-x86_64-unknown-linux-gnu.tar.xz	x64 Linux	checksum

What's new in 1.2.0

Hybrid search (semantic + keyword)

ColGREP and NextPlaid now combine ColBERT semantic search with FTS5 full-text keyword search, fused via Reciprocal Rank Fusion. This means queries match both by meaning and by exact terms — especially useful for symbol names, error codes, and anything where the exact string matters as much as the intent. Hybrid search is on by default; disable per-query with --no-fts or globally with colgrep settings --no-hybrid-search. The NextPlaid API gains unified text_query, alpha, and fusion fields on the /search endpoint, and the Python SDK exposes them directly.

Pipelined indexing (up to 7x faster)

Indexing has been rewritten around a multi-stage pipeline architecture. Encoding, index construction, and I/O now run concurrently via bounded channels instead of sequentially. Combined with parallelized call-graph construction (~20x faster), transactional metadata batching, token-budget-based micro-batching for GPU plan reuse, deduplication of identical code units (~8% savings), and a decomposed per-chunk streaming index build, wall-clock indexing time drops dramatically — from ~25 minutes to ~3.5 minutes on the Zed codebase.

Query latency (~4x faster on large repos)

On large repositories (e.g. PyTorch, 240K code units), query time dropped from ~5s to ~1.2s by eliminating per-query overhead: mtime-based fast paths skip content hashing when files haven't changed, orphan cleanup is now periodic instead of every query, index chunk validation uses an epsilon-tolerant mtime check to avoid expensive re-validation caused by JSON serialization rounding, and query embeddings are cached across the dual semantic+keyword search passes.

Grep-like output for regex matches

When using -e with a regex pattern, ColGREP now prints every matching line in file:line:content format (like grep) instead of just showing code unit ranges. Results are still ranked by semantic relevance.

Python SDK CLI

A new next-plaid CLI (pip install "next-plaid-client[cli]") provides full SDK parity: index management, document add/delete, search (semantic/keyword/hybrid), metadata operations, encoding, and reranking. Designed for agents: non-interactive flags, --dry-run/--yes for destructive ops, stdin support, and actionable errors. Ships with 80 unit tests.

CUDA compatibility

A series of fixes for CUDA driver/toolkit mismatches: PTX is now compiled targeting the device's actual compute capability, cudarc is pinned to CUDA 11.8 symbols for maximum driver compatibility, and panic-based error output during GPU initialization is replaced with clear fallback messages. cuDNN notices only appear during full index creation.

Windows & platform improvements

Windows verbatim paths (\\?\C:\...) are normalized before display. New --force-cpu / --force-gpu flags and NEXT_PLAID_FORCE_CPU / NEXT_PLAID_FORCE_GPU env vars give explicit control over acceleration. Relative paths are now the default in search output, saving ~35% tokens when feeding results to LLMs.

Other improvements

Session hook now activates for small projects (≤50 files) even without a pre-existing index
Score display removed from compact terminal output (still available in JSON)
Recency-weighted ETA estimator for the encoding progress bar

Contributors

Thanks to Johnny Salazar (@cepera_ang) for the pipelined indexing architecture, encoding optimizations, and acceleration mode groundwork; Nick (@NickSdot) for relative paths by default, saving ~35% tokens for LLM usage; and Ivan Chechenev (@Jus2Cat) for the Windows verbatim path fix.