Add: Implementation plan for graph exploration tools

Created comprehensive implementation plan for two new MCP tools:
- get_entity_connections: Direct graph traversal for ALL entity relationships
- get_entity_timeline: Chronological episode history for entities

Plan includes:
- General-purpose PKM focus (architecture, projects, coaching, research)
- Enhanced workflow instructions with decision flowcharts
- Complete code implementation details
- Testing scenarios across multiple domains
- 2-3 hour estimated implementation time

Addresses issue of disconnected/duplicate entities by providing LLM
with complete graph visibility before adding new information.

2025-11-15 09:41:53 +00:00

24 KiB

Raw Blame History

Implementation Plan: Graph Exploration Tools for General PKM

Status: Ready for Implementation Priority: High Estimated Effort: 2-3 hours Created: 2025-11-15 Branch: claude/read-documentation-01J1sX9uuuitPsuoqWR6H9pA

Executive Summary

Add two new MCP tools and significantly enhance workflow instructions to enable effective Personal Knowledge Management (PKM) across all use cases. This implementation addresses the core issue of disconnected/duplicate entities by providing the LLM with complete graph visibility before adding new information.

Use Cases (All Equally Important)

This implementation serves general-purpose PKM, including:

🏗️ Architecture & Technical Decisions - Database choices, API design patterns, technology evaluations
💼 Work Projects - Project constraints, team decisions, deployment considerations
🧠 AI Coaching Sessions - Personal insights, behavioral patterns, goal tracking
📚 Learning & Research - Technical concepts, article summaries, connected ideas
🔧 Problem Solving - Troubleshooting notes, solution patterns, lessons learned
💭 General Knowledge - Any information worth remembering and connecting

Key Insight: All use cases benefit from complete graph exploration and temporal tracking, whether tracking architectural decisions over time or personal coaching insights.

Problem Analysis

Current State

Existing tools: search_nodes, search_memory_facts, add_memory
Issue: LLM creates disconnected/duplicate entities instead of forming proper relations
Root cause: Workflow problem, not architecture limitation

Why Semantic Search Isn't Sufficient

Critical Limitations of search_memory_facts for exploration:

Requires query formulation - Must guess what queries to run
No completeness guarantee - Returns "relevant" results, may miss connections
Expensive for systematic exploration - Multiple speculative semantic searches
Poor temporal tracking - Doesn't return chronological episode history

Why graph traversal is needed:

Pattern recognition - Requires COMPLETE connection data
Proactive exploration - Direct traversal vs guessing semantic queries
Cost efficiency - Single graph query vs multiple semantic searches
Temporal analysis - Chronological episode history for evolution tracking
Reliability - Guaranteed complete results vs probabilistic semantic matching

Solution: Two New Tools + Enhanced Instructions

Tool 1: `get_entity_connections`

Purpose: Direct graph traversal to show ALL relationships connected to an entity

Use Cases:

"Show me everything connected to PostgreSQL decision"
"What constraints are linked to the deployment pipeline?"
"What's connected to my stress mentions?" (coaching)
"All considerations related to authentication approach"

Leverages: EntityEdge.get_by_node_uuid() from graphiti_core/edges.py:456

Tool 2: `get_entity_timeline`

Purpose: Chronological episode history showing when/how an entity was discussed

Use Cases:

"When did we first discuss microservices architecture?"
"Show project status updates over time"
"How has my perspective on work-life balance evolved?" (coaching)
"Timeline of GraphQL research and decisions"

Leverages: EpisodicNode.get_by_entity_node_uuid() from graphiti_core/nodes.py:415

Implementation Details

File to Modify

mcp_server/src/graphiti_mcp_server.py

✅ Allowed per CLAUDE.md (mcp_server modifications permitted) ❌ No changes to graphiti_core/ (uses existing features only)

Changes Required

1. Import Verification (Line ~18)

Verify EpisodicNode is imported:

from graphiti_core.nodes import EpisodeType, EpisodicNode

(Already present - no change needed)

2. Enhanced MCP Instructions (Lines 117-145)

Replace entire GRAPHITI_MCP_INSTRUCTIONS with:

GRAPHITI_MCP_INSTRUCTIONS = """
Graphiti is a memory service for AI agents built on a knowledge graph. It transforms information
into a richly connected knowledge network of entities and relationships.

The system organizes data as:
- **Episodes**: Content snippets (conversations, notes, documents)
- **Nodes**: Entities (people, projects, concepts, decisions, anything)
- **Facts**: Relationships between entities with temporal metadata

## Core Workflow: SEARCH FIRST, THEN ADD

**Always explore existing knowledge before adding new information.**

### WHEN ADDING INFORMATION:

User provides information ↓

search_nodes(extract key entities/concepts) ↓
IF entities found: → get_entity_connections(entity_uuid) - See what's already linked → Optional: get_entity_timeline(entity_uuid) - Understand history ↓
add_memory(episode_body with explicit references to found entities) ↓ Result: New episode automatically connects to existing knowledge


### WHEN RETRIEVING INFORMATION:

User asks question ↓

search_nodes(extract keywords from question) ↓
For relevant entities: → get_entity_connections(uuid) - Explore neighborhood → get_entity_timeline(uuid) - See evolution/history ↓
Optional: search_memory_facts(semantic query) - Find specific relationships ↓ Result: Comprehensive answer from complete graph context


## Tool Selection Guide

**Finding entities:**
- `search_nodes` - Semantic search for entities by keywords/description

**Exploring connections:**
- `get_entity_connections` - **ALL** relationships for an entity (complete, direct graph traversal)
- `search_memory_facts` - Semantic search for relationships (query-driven, may miss some)

**Understanding history:**
- `get_entity_timeline` - **ALL** episodes mentioning an entity (chronological, complete)

**Adding information:**
- `add_memory` - Store new episodes (AFTER searching existing knowledge)

**Retrieval vs Graph Traversal:**
- Use `get_entity_connections` when you need COMPLETE data (pattern detection, exploration)
- Use `search_memory_facts` when you have a specific semantic query

## Examples

### Example 1: Adding Technical Decision

```python
# ❌ BAD: Creates disconnected node
add_memory(
    name="Database choice",
    episode_body="Chose PostgreSQL for new service"
)

# ✅ GOOD: Connects to existing knowledge
nodes = search_nodes(query="database architecture microservices")
# Found: MySQL (main db), Redis (cache), microservices pattern

connections = get_entity_connections(entity_uuid=nodes[0]['uuid'])
# Sees: MySQL connects to user-service, payment-service

add_memory(
    name="Database choice",
    episode_body="Chose PostgreSQL for new notification-service. Different from
                  existing MySQL used by user-service and payment-service because
                  we need better JSON support for notification templates."
)
# Result: Rich connections created automatically

Example 2: Exploring Project Context

# User asks: "What database considerations do we have?"

nodes = search_nodes(query="database")
# Returns: PostgreSQL, MySQL, Redis entities

for node in nodes:
    connections = get_entity_connections(entity_uuid=node['uuid'])
    # Shows ALL services, constraints, decisions connected to each database

    timeline = get_entity_timeline(entity_uuid=node['uuid'])
    # Shows when each database was discussed, decisions made over time

# Synthesize comprehensive answer from COMPLETE data

Example 3: Pattern Recognition (Coaching Context)

# User: "I'm feeling stressed today"

nodes = search_nodes(query="stress")
connections = get_entity_connections(entity_uuid=nodes[0]['uuid'])
# Discovers: stress ↔ work, sleep, project-deadline, coffee-intake

timeline = get_entity_timeline(entity_uuid=nodes[0]['uuid'])
# Shows: First mentioned 3 months ago, frequency increasing

# Can now make informed observations:
# "I notice stress is connected to work deadlines, sleep patterns, and
#  increased coffee. Mentions have tripled since the new project started."

add_memory(
    name="Stress discussion",
    episode_body="Discussed stress today. User recognizes connection to
                  project deadlines and sleep quality from our past conversations."
)

Key Principles

Always search before adding - Even for trivial data
Reference found entities by name - Creates automatic connections
Use complete data for patterns - Graph traversal, not semantic guessing
Track evolution over time - Timeline shows how understanding changed

Technical Notes

All tools respect group_id filtering for namespace isolation
Temporal metadata tracks both creation time and domain time
Facts can be invalidated (superseded) without deletion
Processing is asynchronous - episodes queued for background processing """


#### 3. Add Tool: `get_entity_connections` (After existing tools, ~line 850)

```python
@mcp.tool(
    annotations={
        'title': 'Get Entity Connections',
        'readOnlyHint': True,
        'destructiveHint': False,
        'idempotentHint': True,
        'openWorldHint': True,
    },
)
async def get_entity_connections(
    entity_uuid: str,
    group_ids: list[str] | None = None,
    max_connections: int = 50,
) -> FactSearchResponse | ErrorResponse:
    """Get ALL relationships connected to a specific entity. **Complete graph traversal.**

    **Use this to explore what's already known about an entity before adding new information.**

    Unlike search_memory_facts which requires a semantic query, this performs direct graph
    traversal to return EVERY relationship where the entity is involved. Guarantees completeness.

    WHEN TO USE THIS TOOL:
    - Exploring entity neighborhood → get_entity_connections **USE THIS**
    - Before adding related info → get_entity_connections **USE THIS**
    - Pattern detection (need complete data) → get_entity_connections **USE THIS**
    - Understanding full context without formulating queries

    WHEN NOT to use:
    - Specific semantic query → use search_memory_facts instead
    - Finding entities → use search_nodes instead

    Use Cases:
    - "Show everything connected to PostgreSQL decision"
    - "What's linked to the authentication service?"
    - "All relationships for Django project"
    - Before adding: "Let me see what's already known about X"

    Args:
        entity_uuid: UUID of entity to explore (from search_nodes or previous results)
        group_ids: Optional list of namespaces to filter connections
        max_connections: Maximum relationships to return (default: 50)

    Returns:
        FactSearchResponse with all connected relationships and temporal metadata

    Examples:
        # After finding entity
        nodes = search_nodes(query="Django project")
        django_uuid = nodes[0]['uuid']

        # Get all connections
        connections = get_entity_connections(entity_uuid=django_uuid)

        # Limited results for specific namespace
        connections = get_entity_connections(
            entity_uuid=django_uuid,
            group_ids=["work"],
            max_connections=20
        )
    """
    global graphiti_service

    if graphiti_service is None:
        return ErrorResponse(error='Graphiti service not initialized')

    try:
        client = await graphiti_service.get_client()

        # Use existing EntityEdge.get_by_node_uuid() method
        edges = await EntityEdge.get_by_node_uuid(client.driver, entity_uuid)

        # Filter by group_ids if provided
        if group_ids:
            edges = [e for e in edges if e.group_id in group_ids]

        # Limit results
        edges = edges[:max_connections]

        if not edges:
            return FactSearchResponse(
                message=f'No connections found for entity {entity_uuid}',
                facts=[]
            )

        # Format using existing formatter
        facts = [format_fact_result(edge) for edge in edges]

        return FactSearchResponse(
            message=f'Found {len(facts)} connection(s) for entity',
            facts=facts
        )

    except Exception as e:
        error_msg = str(e)
        logger.error(f'Error getting entity connections: {error_msg}')
        return ErrorResponse(error=f'Error getting entity connections: {error_msg}')

4. Add Tool: `get_entity_timeline` (After get_entity_connections)

@mcp.tool(
    annotations={
        'title': 'Get Entity Timeline',
        'readOnlyHint': True,
        'destructiveHint': False,
        'idempotentHint': True,
        'openWorldHint': True,
    },
)
async def get_entity_timeline(
    entity_uuid: str,
    group_ids: list[str] | None = None,
    max_episodes: int = 20,
) -> EpisodeSearchResponse | ErrorResponse:
    """Get chronological episode history for an entity. **Shows evolution over time.**

    **Use this to understand how an entity was discussed across conversations.**

    Returns ALL episodes where this entity was mentioned, in chronological order.
    Shows the full conversational history and temporal evolution of concepts, decisions,
    or any tracked entity.

    WHEN TO USE THIS TOOL:
    - Understanding entity history → get_entity_timeline **USE THIS**
    - "When did we discuss X?" → get_entity_timeline **USE THIS**
    - Tracking evolution over time → get_entity_timeline **USE THIS**
    - Seeing context from original sources

    WHEN NOT to use:
    - Semantic search across all episodes → use search_episodes instead
    - Finding entities → use search_nodes instead

    Use Cases:
    - "When did we first discuss microservices architecture?"
    - "Show all mentions of the deployment pipeline"
    - "Timeline of stress mentions" (coaching context)
    - "How did our understanding of GraphQL evolve?"

    Args:
        entity_uuid: UUID of entity (from search_nodes or previous results)
        group_ids: Optional list of namespaces to filter episodes
        max_episodes: Maximum episodes to return (default: 20, chronological)

    Returns:
        EpisodeSearchResponse with episodes ordered chronologically

    Examples:
        # After finding entity
        nodes = search_nodes(query="microservices")
        arch_uuid = nodes[0]['uuid']

        # Get conversation history
        timeline = get_entity_timeline(entity_uuid=arch_uuid)

        # Recent episodes only for specific namespace
        timeline = get_entity_timeline(
            entity_uuid=arch_uuid,
            group_ids=["architecture"],
            max_episodes=10
        )
    """
    global graphiti_service

    if graphiti_service is None:
        return ErrorResponse(error='Graphiti service not initialized')

    try:
        client = await graphiti_service.get_client()

        # Use existing EpisodicNode.get_by_entity_node_uuid() method
        episodes = await EpisodicNode.get_by_entity_node_uuid(
            client.driver,
            entity_uuid
        )

        # Filter by group_ids if provided
        if group_ids:
            episodes = [e for e in episodes if e.group_id in group_ids]

        # Sort by valid_at (chronological order)
        episodes.sort(key=lambda e: e.valid_at)

        # Limit results
        episodes = episodes[:max_episodes]

        if not episodes:
            return EpisodeSearchResponse(
                message=f'No episodes found mentioning entity {entity_uuid}',
                episodes=[]
            )

        # Format episodes
        episode_results = []
        for ep in episodes:
            episode_results.append({
                'uuid': ep.uuid,
                'name': ep.name,
                'content': ep.content,
                'valid_at': ep.valid_at.isoformat() if ep.valid_at else None,
                'created_at': ep.created_at.isoformat() if ep.created_at else None,
                'source': ep.source.value if ep.source else None,
                'group_id': ep.group_id,
            })

        return EpisodeSearchResponse(
            message=f'Found {len(episode_results)} episode(s) mentioning entity',
            episodes=episode_results
        )

    except Exception as e:
        error_msg = str(e)
        logger.error(f'Error getting entity timeline: {error_msg}')
        return ErrorResponse(error=f'Error getting entity timeline: {error_msg}')

5. Update `add_memory` Docstring (Line ~385)

Add after line 387 ("""Add information to memory...):

    """Add information to memory. **This is the PRIMARY method for storing information.**

    **⚠️ IMPORTANT: SEARCH FIRST, THEN ADD**

    Before using this tool, always:
    1. Search for related entities: search_nodes(query="relevant keywords")
    2. Explore their connections: get_entity_connections(entity_uuid=found_uuid)
    3. Then add with context: Reference found entities by name in episode_body

    This creates rich automatic connections instead of isolated nodes.

    **PRIORITY: Use this AFTER exploring existing knowledge.**

Testing Plan

Manual Testing Scenarios

Test 1: Connection Exploration

# Setup
add_memory(name="Tech stack", episode_body="Using Django with PostgreSQL")
nodes = search_nodes(query="Django")
django_uuid = nodes[0]['uuid']

# Test
connections = get_entity_connections(entity_uuid=django_uuid)

# Verify
assert len(connections['facts']) > 0
assert any('PostgreSQL' in str(fact) for fact in connections['facts'])

Test 2: Timeline Tracking

# Setup: Add multiple episodes over time
add_memory(name="Week 1", episode_body="Started Django project evaluation")
add_memory(name="Week 2", episode_body="Django project approved, beginning development")
add_memory(name="Week 3", episode_body="Django project hit first milestone")

nodes = search_nodes(query="Django project")
project_uuid = nodes[0]['uuid']

# Test
timeline = get_entity_timeline(entity_uuid=project_uuid)

# Verify
assert len(timeline['episodes']) >= 3
assert timeline['episodes'][0]['name'] == "Week 1"  # chronological

Test 3: Search-First Workflow

# Step 1: Search
nodes = search_nodes(query="database architecture")
# Found: PostgreSQL, MySQL entities

# Step 2: Explore
postgres_uuid = nodes[0]['uuid']
connections = get_entity_connections(entity_uuid=postgres_uuid)
timeline = get_entity_timeline(entity_uuid=postgres_uuid)

# Step 3: Add with context
add_memory(
    name="New service database",
    episode_body="New notification service will use PostgreSQL like our
                  existing user service, chosen for JSON support"
)

# Verify: New episode should connect to existing PostgreSQL entity
new_connections = get_entity_connections(entity_uuid=postgres_uuid)
assert len(new_connections['facts']) > len(connections['facts'])

Test 4: Cross-Domain Usage

# Architecture decision
add_memory(name="API Design", episode_body="Using REST for public API, GraphQL internal")

# Coaching note
add_memory(name="Stress check-in", episode_body="Feeling stressed about API deadlines")

# Research note
add_memory(name="GraphQL Learning", episode_body="Studied GraphQL federation patterns")

# Test: All should work with same tools
for query in ["API", "stress", "GraphQL"]:
    nodes = search_nodes(query=query)
    if nodes:
        connections = get_entity_connections(entity_uuid=nodes[0]['uuid'])
        timeline = get_entity_timeline(entity_uuid=nodes[0]['uuid'])
        assert connections is not None
        assert timeline is not None

Integration Testing

Test with LibreChat MCP integration
Verify tool discovery in MCP inspector
Test error handling (invalid UUIDs, missing entities)
Verify group_id filtering works correctly
Test with various entity types

Error Handling Tests

# Invalid UUID
result = get_entity_connections(entity_uuid="invalid-uuid")
assert 'error' in result

# Empty results
result = get_entity_connections(entity_uuid="non-existent-uuid")
assert result['message'].contains('No connections')

# Group filtering
result = get_entity_connections(
    entity_uuid=valid_uuid,
    group_ids=["non-existent-group"]
)
assert len(result['facts']) == 0

Deployment Strategy

Phase 1: Implementation (Current Branch)

Make code changes on claude/read-documentation-01J1sX9uuuitPsuoqWR6H9pA
Manual testing with local MCP server
Verify with MCP inspector

Phase 2: Docker Build

Update version in mcp_server/pyproject.toml
Trigger custom Docker build workflow
Test in containerized environment

Phase 3: Documentation

Update mcp_server/README.md with new tools
Add workflow examples
Document best practices

Success Metrics

Qualitative

✅ LLM can explore complete entity context before adding data
✅ Users can ask "what do I know about X?" and get comprehensive view
✅ Reduced duplicate/disconnected entities
✅ Temporal tracking enables evolution analysis
✅ Works equally well for technical decisions, projects, and personal notes

Quantitative

Implementation: ~150 lines of code (tools + enhanced instructions)
No new dependencies required
100% reuses existing Graphiti features
Implementation time: 2-3 hours
Backward compatible: adds tools, doesn't modify existing

Timeline

Phase	Duration	Tasks
Code Implementation	45-60 min	Add two tools, update instructions, update add_memory docstring
Manual Testing	30-45 min	Run test scenarios, verify functionality
Integration Testing	30-45 min	Test with MCP server, error handling, edge cases
Documentation	15-30 min	Code comments, commit messages
Total	2-3 hours

Rejected Alternatives

Alternative 1: Enhanced Instructions Only

Why Rejected:

search_memory_facts requires semantic query formulation (guessing)
No completeness guarantee for pattern detection
No chronological episode history for temporal tracking
More expensive (multiple semantic searches vs single graph query)

Alternative 2: Build Episode Reprocessing Tool

Why Rejected:

Episodes are immutable - reprocessing yields identical results
Expensive (re-runs LLM on all episodes)
Doesn't solve root problem (LLM needs visibility, not reprocessing)

Alternative 3: Community Clustering Tool

Why Rejected:

No clear trigger for when to call it
Expensive with unclear value
Current search doesn't use communities
Better as manual admin operation

References

Code References

graphiti_core/edges.py:456 - EntityEdge.get_by_node_uuid()
graphiti_core/nodes.py:415 - EpisodicNode.get_by_entity_node_uuid()
mcp_server/src/graphiti_mcp_server.py - MCP tool definitions
mcp_server/src/utils/formatting.py - Result formatters

CLAUDE.md - Project modification guidelines
DOCS/MCP-Tool-Descriptions-Final-Revision.md - MCP tool patterns
Investigation documents (from other branch) - PKM use case analysis

Risk Assessment

LOW RISK:

✅ No changes to graphiti_core/ (respects CLAUDE.md)
✅ Only exposes existing, tested functionality
✅ Backward compatible (adds tools, doesn't modify)
✅ Standard FastMCP patterns and annotations
✅ Error handling follows existing patterns

POTENTIAL ISSUES:

Large result sets (mitigated by max_connections/max_episodes limits)
Invalid UUIDs (handled with try/except and error responses)
Missing entities (returns empty results with clear message)

Post-Implementation

Monitoring

Watch for usage patterns in logs
Monitor performance (graph queries should be fast)
Track user feedback

Future Enhancements (Optional)

Path finding between two entities
Subgraph extraction (entity + N-hop neighborhood)
Temporal filters (episodes/connections within time range)
Bidirectional entity influence tracking

Notes

General-purpose design works for ALL PKM use cases
Coaching analysis proved need for complete data, but tools stay domain-agnostic
Instructions use mixed examples (technical, project, personal) to emphasize versatility
Tools leverage existing Graphiti features - zero core changes needed

Ready for implementation upon approval.

24 KiB Raw Blame History

Implementation Plan: Graph Exploration Tools for General PKM

Executive Summary

Use Cases (All Equally Important)

Problem Analysis

Current State

Why Semantic Search Isn't Sufficient

Solution: Two New Tools + Enhanced Instructions

Tool 1: get_entity_connections

Tool 2: get_entity_timeline

Implementation Details

File to Modify

Changes Required

1. Import Verification (Line ~18)

2. Enhanced MCP Instructions (Lines 117-145)

Example 2: Exploring Project Context

Example 3: Pattern Recognition (Coaching Context)

Key Principles

Technical Notes

4. Add Tool: get_entity_timeline (After get_entity_connections)

5. Update add_memory Docstring (Line ~385)

Testing Plan

Manual Testing Scenarios

Integration Testing

Error Handling Tests

Deployment Strategy

Phase 1: Implementation (Current Branch)

Phase 2: Docker Build

Phase 3: Documentation

Success Metrics

Qualitative

Quantitative

Timeline

Rejected Alternatives

Alternative 1: Enhanced Instructions Only

Alternative 2: Build Episode Reprocessing Tool

Alternative 3: Community Clustering Tool

References

Code References

Related Documents

Risk Assessment

Post-Implementation

Monitoring

Future Enhancements (Optional)

Notes

24 KiB

Raw Blame History

Tool 1: `get_entity_connections`

Tool 2: `get_entity_timeline`

4. Add Tool: `get_entity_timeline` (After get_entity_connections)

5. Update `add_memory` Docstring (Line ~385)