github safishamsi/graphify v0.9.0
v0.9.0 — full-path node IDs (breaking)

latest release: v0.9.1
4 hours ago

⚠️ Breaking change — node IDs changed

Node IDs now include the full repo-relative path, fixing silent data loss when same-named files live in different directories. Existing graphs migrate automatically on the next build/update (no LLM re-bill). Run graphify extract --force to recover nodes that previously collided. If you push to a persisted Neo4j store, re-import after upgrading; GraphML/Gephi layouts go stale; query by label rather than persisting node IDs.

  • Breaking — node IDs now include the full repo-relative path (#1504, #1509). The node-ID stem was the immediate parent dir + filename, so same-named files in different directories collided into one last-writer-wins node and silently dropped graph content (docs/v1/api/README.md and docs/v2/api/README.md both → api_readme). The stem is now the full repo-relative path (docs_v1_api_readme vs docs_v2_api_readme); top-level files are unchanged (setup.pysetup). The AST extractor, the LLM system prompt, the extraction-spec, and the two hand-copied stem helpers are all aligned to this one rule (fixing the #1509 AST↔LLM divergence that produced ghost duplicates), and build_from_json deterministically re-keys any cached/older semantic fragment onto the new IDs from its source_file so the unversioned semantic cache survives without ghosts or a re-bill. Existing graphs migrate to the new ID format automatically on the next build/update (no re-bill). Note: same-named files in different directories that previously collided into one node are only recovered as distinct nodes by a fresh extraction — run graphify extract --force to rebuild and gain them (migrating an already-collided graph/cache can't resurrect the nodes that were already dropped). If you push to a persisted Neo4j store, re-import after upgrading (re-exported IDs change); saved Gephi/yEd (GraphML) layouts go stale; MCP/cypher consumers should query by label rather than persisting node IDs across rebuilds.
  • Feat: --timing flag on graphify extract and graphify cluster-only prints per-stage wall-clock timings to stderr (#1490). Shows how long each pipeline stage takes — extract: detect → AST → semantic → build → cluster → analyze → export; cluster-only: load → cluster → analyze → label → report → export — plus a final total, so slow stages are visible on large corpora. Off by default (monotonic perf_counter, stderr-only); machine-read stdout / graph.json are unchanged.

Don't miss a new graphify release

NewReleases is sending notifications on new releases.