graphiti/SUMMARY_PIPELINE_STATE.md
2025-11-01 18:26:56 -07:00

126 lines
7.3 KiB
Markdown

# Graphiti Node Summary Pipelines
This document outlines how entity summaries are generated today and the pragmatic changes proposed to gate expensive LLM calls using novelty and recency heuristics.
## Current Execution Flow
```
+------------------+ +-------------------------------+
| Episode Ingest | ----> | retrieve_episodes(last=3) |
+------------------+ +-------------------------------+
| |
v v
+----------------------+ +---------------------------+
| extract_nodes | | extract_edges |
| (LLM + reflexion) | | (LLM fact extraction) |
+----------------------+ +---------------------------+
| |
| dedupe candidates |
v v
+--------------------------+ +---------------------------+
| resolve_extracted_nodes | | resolve_extracted_edges |
| (deterministic + LLM) | | (LLM dedupe, invalidates)|
+--------------------------+ +---------------------------+
| |
+----------- parallel ----+
|
v
+-----------------------------------------+
| extract_attributes_from_nodes |
| - LLM attribute fill (optional) |
| - LLM summary refresh (always runs) |
+-----------------------------------------+
```
### Current Data & Timing Characteristics
- **Previous context**: only the latest `EPISODE_WINDOW_LEN = 3` episodes are retrieved before any LLM call.
- **Fact availability**: raw `EntityEdge.fact` strings exist immediately after edge extraction; embeddings are produced inside `resolve_extracted_edges`, concurrently with summary generation.
- **Summary inputs**: the summary prompt only sees `node.summary`, node attributes, episode content, and the three-episode window. It does *not* observe resolved fact invalidations or final edge embeddings.
- **LLM usage**: every node that survives dedupe invokes the summary prompt, even when the episode is low-information or repeat content.
## Proposed Execution Flow
```
+------------------+ +-------------------------------+
| Episode Ingest | ----> | retrieve_episodes(last=3) |
+------------------+ +-------------------------------+
| |
v v
+----------------------+ +---------------------------+
| extract_nodes | | extract_edges |
| (LLM + reflexion) | | (LLM fact extraction) |
+----------------------+ +---------------------------+
| |
| dedupe candidates |
v v
+--------------------------+ +---------------------------+
| resolve_extracted_nodes | | build_node_deltas |
| (deterministic + LLM) | | - new facts per node |
+--------------------------+ | - embed facts upfront |
| | - track candidate flips |
| +-------------+-------------+
| |
| v
| +---------------------------+
| | resolve_extracted_edges |
| | (uses prebuilt embeddings|
| | updates NodeDelta state)|
| +-------------+-------------+
| |
| v
| +---------------------------+
| | summary_gate.should_refresh|
| | - fact hash drift |
| | - embedding drift |
| | - negation / invalidation |
| | - burst & staleness rules |
| +------+------+------------+
| | |
| | +------------------------------+
| | |
| v v
| +---------------------------+ +-----------------------------+
| | skip summary (log cause) | | extract_summary LLM call |
| | update metadata only | | update summary & metadata |
| +---------------------------+ +-----------------------------+
v
+-------------------------------+
| add_nodes_and_edges_bulk |
+-------------------------------+
```
### Key Proposed Changes
1. **NodeDelta staging**
- Generate fact embeddings immediately after edge extraction (`create_entity_edge_embeddings`) and group fact deltas by target node.
- Record potential invalidations and episode timestamps within the delta so the summary gate has full context.
2. **Deterministic novelty checks**
- Maintain a stable hash of active facts (new facts minus invalidated). Skip summarisation when the hash matches the stored value.
- Compare pooled fact embeddings to the persisted summary embedding; trigger refresh only when cosine drift exceeds a tuned threshold.
- Force refresh whenever the delta indicates polarity changes (contradiction/negation cues).
3. **Recency & burst handling**
- Track recent episode timestamps per node in metadata. If multiple episodes arrive within a short window, accumulate deltas and defer the summary until the burst ends or a hard cap is reached.
- Enforce a staleness SLA (`last_summary_ts`) so long-lived nodes eventually refresh even if novelty remains low.
4. **Metadata persistence**
- Persist gate state on each entity (`_graphiti_meta`: `fact_hash`, `summary_embedding`, `last_summary_ts`, `_recent_episode_times`, `_burst_active`).
- Update metadata whether the summary runs or is skipped to keep the gating logic deterministic across ingests.
5. **Observability**
- Emit counters for `summary_skipped`, `summary_refreshed`, and reasons (unchanged hash, low drift, burst deferral, staleness). Sample a small percentage of skipped cases to validate heuristics.
## Implementation Snapshot
| Area | Current | Proposed |
| ---- | ------- | -------- |
| Summary trigger | Always per deduped node | Controlled by `summary_gate` using fact hash, embedding drift, negation, burst, staleness |
| Fact embeddings | Produced during edge resolution (parallel) | Produced immediately after extraction and reused downstream |
| Fact availability in summaries | Not available | Encapsulated in `NodeDelta` passed into summary gate |
| Metadata on node | Summary text + organic attributes | Summary text + `_graphiti_meta` (hash, embedding, timestamps, burst state) |
| Recency handling | None | Deque of recent episode timestamps + burst deferral |
| Negation detection | LLM-only inside edge resolution | Propagated into gate to force summary refresh |
These adjustments retain the existing inline execution model—no scheduled jobs—while reducing unnecessary LLM calls and improving determinism by grounding summaries in the same fact set that backs the graph.