Performance
The headline improvements:
sem diff --patch: 61s → 410ms (149x faster)- Graph resolution: O(n²) → O(n)
- 71K-file TS monorepo: DNF → 6.5s with cache
Under the hood:
- Streaming JSON output — results stream as they're computed instead of buffering in memory
- mimalloc allocator — significant win for allocation-heavy tree-sitter parsing
- Topology-only cache — caches dependency graph structure so repeated queries skip re-parsing unchanged files
- Content fingerprinting — skips re-analysis of files that haven't changed based on hash comparison
- Batched file fingerprint refresh — batched SQLite roundtrips instead of one-at-a-time
- Import table allocation reduction in bag-of-words resolver
- SQLite cache schema v4 with
file_importstable for faster import resolution - Partial topology cache for large impact runs — cold
sem impacton 100K-file repos drops from 90s to 34s, warm runs under 2s - Direct deps path for large impact runs
- Stabilized graph output for ambiguous calls
Consolidation
Major internal cleanup across the codebase:
- Consolidated identity matching, diff output filtering, diff path/patch parsing
- Consolidated cache and verify behavior
- Consolidated graph and scope resolution
- Consolidated TypeScript/JavaScript parser resolution
- Consolidated Swift extraction
Features
- LaTeX language support — regex-based parser for
.tex,.latex,.cls,.styfiles with preamble, command, section, and environment entity extraction
Fixes
- Skip binary and generated files in repo scans
- Handle cross-language file diffs
- Return MCP tool errors for bad inputs instead of generic failures
- MCP entity lookup failures now name the requested file and include candidate suggestions
- Stabilized
sem graph --jsonoutput ordering - Bounded
sem loghistory scans - Fixed subdirectory path resolution in CLI commands
- Portable format flag test across platforms