Real-time token savings, visible to humans. The estimated context-savings metric introduced in 2.3.4 was JSON-only. In 2.3.5 it surfaces as a clean boxed panel on the CLI and is verifiable against a real tokenizer in one flag — so when you reach for code-review-graph to review a change, you can immediately see how much of your context window the graph just kept out.
Highlights
- 🪟 Token Savings panel on both
code-review-graph detect-changes --briefand the newcode-review-graph update --brief. Per-category breakdown (Functions / Tests / Risk / Other) that sums exactly to the graph response size. - ✅
--verifyflag cross-checks the displayed numbers against OpenAI'scl100k_basetokenizer. Calibration shows the estimate stays within +0.5% of real GPT-4 tokens in aggregate across 222 mixed-language source files (data indocs/REPRODUCING.md). - 🔁 Deterministic eval pipeline — pinned upstream SHAs, full clones with
returncodechecks, fixed Leiden seed. Two contributors running the benchmark recipe on different machines on different days now produce identical numbers. - 🎯 Multi-hop retrieval benchmark + richer embedding text + identifier-aware search boost lift compound-query accuracy from 0.545 → 0.909.
- 📦
code-review-graph embedCLI subcommand for explicit embedding generation. Previously only reachable via MCP.
What the panel looks like
```text
┌─────────────────────── Token Savings ────────────────────────┐
│ Full context would be: 12,921 tokens │
│ Graph context used: 762 tokens │
│ Saved: 12,159 tokens (~94%) │
│ Breakdown: Functions 244 · Tests 191 · Risk 244 · Other 83 │
└──────────────────────────────────────────────────────────────┘
```
Add --verify to grow a Verified (tiktoken) row so the numbers are no longer just an estimate.
Reproduction
End-to-end recipe with canonical numbers in docs/REPRODUCING.md. All 6 test repos pin upstream SHAs, embeddings are deterministic on CPU, Leiden detection is seeded.
