What's new
Fixes
- Ollama VRAM exhaustion (#798):
num_ctxis now derived from the actual chunk size instead of hardcoded 131072. With--token-budget 8192, the old value forced Ollama to allocate 128k KV-cache slots on a 31B model — 4×128k slots by chunk 4 caused OOM. New formula:min(input_tokens + output_cap + 2000, 131072)so an 8k chunk gets ~26k instead. - Hollow-response warning improved: now mentions VRAM pressure and points to
GRAPHIFY_OLLAMA_NUM_CTX/GRAPHIFY_OLLAMA_KEEP_ALIVEenv vars as tuning knobs.
Features
graphify export callflow-html(#797): generates a self-contained Mermaid architecture/call-flow HTML page fromgraphify-out/graph.json— community sections, interactive flowcharts with zoom/pan, call detail tables, and graph report highlights.- Living architecture diagram (#800): callflow HTML now auto-regenerates on every
--watchrebuild and post-commit hook if the file already exists. Run once, stays current forever.
Upgrade
uv tool upgrade graphifyy
# or: pip install --upgrade graphifyy