⚠️ Breaking change — node IDs changed
Node IDs now include the full repo-relative path, fixing silent data loss when same-named files live in different directories. Existing graphs migrate automatically on the next build/update (no LLM re-bill). Run graphify extract --force to recover nodes that previously collided. If you push to a persisted Neo4j store, re-import after upgrading; GraphML/Gephi layouts go stale; query by label rather than persisting node IDs.
- Breaking — node IDs now include the full repo-relative path (#1504, #1509). The node-ID stem was the immediate parent dir + filename, so same-named files in different directories collided into one last-writer-wins node and silently dropped graph content (
docs/v1/api/README.mdanddocs/v2/api/README.mdboth →api_readme). The stem is now the full repo-relative path (docs_v1_api_readmevsdocs_v2_api_readme); top-level files are unchanged (setup.py→setup). The AST extractor, the LLM system prompt, the extraction-spec, and the two hand-copied stem helpers are all aligned to this one rule (fixing the #1509 AST↔LLM divergence that produced ghost duplicates), andbuild_from_jsondeterministically re-keys any cached/older semantic fragment onto the new IDs from itssource_fileso the unversioned semantic cache survives without ghosts or a re-bill. Existing graphs migrate to the new ID format automatically on the nextbuild/update(no re-bill). Note: same-named files in different directories that previously collided into one node are only recovered as distinct nodes by a fresh extraction — rungraphify extract --forceto rebuild and gain them (migrating an already-collided graph/cache can't resurrect the nodes that were already dropped). If you push to a persisted Neo4j store, re-import after upgrading (re-exported IDs change); saved Gephi/yEd (GraphML) layouts go stale; MCP/cypher consumers should query by label rather than persisting node IDs across rebuilds. - Feat:
--timingflag ongraphify extractandgraphify cluster-onlyprints per-stage wall-clock timings to stderr (#1490). Shows how long each pipeline stage takes —extract: detect → AST → semantic → build → cluster → analyze → export;cluster-only: load → cluster → analyze → label → report → export — plus a final total, so slow stages are visible on large corpora. Off by default (monotonicperf_counter, stderr-only); machine-read stdout /graph.jsonare unchanged.