Patch release covering quality and reliability improvements landed since 0.29.0.
Extraction quality
Attribute-hallucination guards (#1498) — addresses a customer report of up to 9KB of LLM meta-reasoning landing in entity attribute fields (e.g. phones,
industry). The LLM was dumping internal deliberation and echoing schema description text back into values. Three layered defenses:
- Prompt-level: extract_attributes prompts for both nodes and edges now have hard rules forbidding parenthetical reasoning, deliberative phrases, candidate
alternatives, schema-description echoes, and null/N/A sentence stand-ins. Includes positive/negative examples. - Combined-extraction prompt: new OUTPUT DISCIPLINE block closes the "topic-as-Person" failure mode (full-sentence entity names) and pre-empts reasoning leaks
into fact text and relation_type. - Provider-level preamble: a shared _apply_attribute_extraction_preamble on LLMClient mutates the system message across OpenAI, Anthropic, Gemini, and GLiNER2
so the "field descriptions are format specs, not values" instruction reaches every provider regardless of how structured output is wired. Sentinel-based
idempotency.
Structural backstop — cap_string_attributes: drops any string attribute >250 chars before it reaches the graph. Configurable per-field via Pydantic
Field(max_length=...) or globally via GRAPHITI_ATTRIBUTE_MAX_LENGTH. Also handles list[str] (per-item + aggregate cap of max_len * 8). Required-field carve-out
retains the value with a louder WARNING rather than dropping (avoids ValidationError-ing the whole entity). Edge writes use scoped merge for over-cap drops only
— fields the LLM legitimately omitted still clear. group_id flows to the log line for tenant correlation.
Combined-extraction entity & edge precision (#1498) — targets six classes of low-quality entities (multi-choice fragments, clock times, quantities, coordinates,
imperative tip phrases, slogans) plus two systematic edge regressions (location-fragmentation through scenery intermediaries, dropped multi-episode setup
facts).
- ENTITY RULE 6 expanded into 8 sub-bullets covering the skip classes; ENTITY RULE 9 added for didactic / tutorial scaffolding.
- FACT RULE 4 strengthened around narrative announcements, forward commitments, plans, and emotional setup in conversational episodes; FACT RULE 8 added
requiring direct speaker-to-target edges over fragmenting through scenery intermediaries. - Negative-examples block (A–G) added with paired source-and-extraction guidance per class.
Locomo 500-example eval (prompt change only):
- Entity F1: 0.674 (flat); precision −0.028, recall +0.025
- Edge volume: +7.7% over baseline; gold-edge recall +0.9pp
- Fact-text Jaccard on matched edges: 0.341 → 0.381
- Targeted entity classes on LME-derivative sample: quantities −93%, imperative tips −40%
Saga summarization
Episode-time watermarks for sagas (#1498) — saga (thread summary) timestamps now reflect originating-episode time rather than wall-clock time. SagaNode now
carries two deliberately distinct watermarks:
- last_summarized_at (wall-clock) — the filter watermark. summarize_saga picks up any episode whose created_at > this value. Wall-clock + created_at comparison
keeps backfilled episodes reachable on the next run. - last_summarized_episode_valid_at (episode time) — the temporal watermark. Max valid_at across episodes covered by the current summary; advances monotonically.
Public/temporal consumers ("how recent is this summary's content in event-time?") should use this field.
saga_get_episode_contents now returns list[(content, valid_at)] tuples so callers can compute the watermark. _get_or_create_saga renames now → created_at;
callers pass the episode's valid_at so a newly minted saga's created_at matches the episode that produced it.
Docker / packaging
- fix(mcp): install MCP provider extras in Docker images (#1461)
- fix(docker): mount FalkorDB volume to actual data path (#1462) — earlier mount path didn't match FalkorDB's data directory, so data wasn't actually persisted
across container restarts. - chore(deps): bump uv group across 2 directories (#1473) — jupyterlab 4.5.7, plus indirect updates to jupyter-server 2.18.0, mistune 3.2.1, python-multipart
0.0.27.
Docs
- docs(graphiti): drop stale add_episode_bulk warning (#1476) — removed an outdated docstring that warned the bulk path skipped edge invalidation and date
extraction. That hasn't been true since the bulk pipeline was rewritten to share per-episode primitives (resolve_extracted_edges, extract_edges both run on the
bulk path).
What's Changed
- docs(graphiti): drop stale add_episode_bulk edge-invalidation/date-extraction note by @prasmussen15 in #1476
- Bump the uv group across 2 directories with 4 updates by @dependabot[bot] in #1473
- fix(docker): mount FalkorDB volume to actual data path by @fengfeng-zi in #1462
- fix: install MCP provider extras in Docker images by @fengfeng-zi in #1461
- Forward-port: attribute-hallucination guards, combined-extraction precision, saga episode-time watermarks by @prasmussen15 in #1498
- Bump graphiti-core version to 0.29.1 by @prasmussen15 in #1499
New Contributors
- @fengfeng-zi made their first contribution in #1462
Full Changelog: v0.29.0...v0.29.1