Performance
5 optimizations that make sem significantly faster:
- xxHash64 replaces SHA-256 for content and structural hashing (~10x faster)
- Zero-allocation streaming structural hash hashes AST tokens directly from source bytes
- Cached tree resolution resolves git trees once per scope instead of per file (31% faster on ranges)
- LTO + codegen-units=1 for better cross-crate inlining
- HashSet<&str> eliminates String allocations in entity matching
Benchmarks (hyperfine -N, 50 runs)
| Scenario | v0.2.0 | v0.3.0 | Change |
|---|---|---|---|
| Small (1 file) | 7ms | 5ms | 29% faster |
| Medium (5 files) | 10ms | 8ms | 20% faster |
| Large (13 files) | 22ms | 19ms | 14% faster |
| Range (8 commits) | 35ms | 24ms | 31% faster |
sem diff is now faster than git diff on medium commits (8ms vs 9ms) while providing full entity-level semantic analysis.
New since v0.2.0
- PHP support (13 languages total)
- Fortran support
- C++, Ruby, C# support
- Java and C support
- sem blame for entity-level blame
- sem graph for cross-file dependency graph
- sem impact for transitive impact analysis
- Parallel entity extraction via rayon
- Cosmetic vs structural change detection via AST-normalized structural hashing
- Incremental graph updates
- llms.txt for AI agent consumption