graphiti/DOCS/BACKLOG-Neo4j-Database-Configuration-Fix.md
Lars Varming 341efd8c3d Fix: Critical database parameter bug + index creation error handling
CRITICAL FIX - Database Parameter (graphiti_core):
- Fixed graphiti_core/driver/neo4j_driver.py execute_query method
- database_ parameter was incorrectly added to params dict instead of kwargs
- Now correctly passed as keyword argument to Neo4j driver
- Impact: All queries now execute in configured database (not default 'neo4j')
- Root cause: Violated Neo4j Python driver API contract

Technical Details:
Previous code (BROKEN):
  params.setdefault('database_', self._database)  # Wrong - in params dict
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

Fixed code (CORRECT):
  kwargs.setdefault('database_', self._database)  # Correct - in kwargs
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

FIX - Index Creation Error Handling (MCP server):
- Added graceful handling for Neo4j IF NOT EXISTS bug
- Prevents MCP server crash when indices already exist
- Logs warning instead of failing initialization
- Handles EquivalentSchemaRuleAlreadyExists error gracefully

Files Modified:
- graphiti_core/driver/neo4j_driver.py (3 lines changed)
- mcp_server/src/graphiti_mcp_server.py (12 lines added error handling)
- mcp_server/pyproject.toml (version bump to 1.0.5)

Testing:
- Python syntax validation: PASSED
- Ruff formatting: PASSED
- Ruff linting: PASSED

Closes issues with:
- Data being stored in wrong Neo4j database
- MCP server crashing on startup with EquivalentSchemaRuleAlreadyExists
- NEO4J_DATABASE environment variable being ignored

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 11:37:16 +01:00

11 KiB

BACKLOG: Neo4j Database Configuration Fix

Status: Ready for Implementation Priority: Medium Type: Bug Fix + Architecture Improvement Date: 2025-11-09

Problem Statement

The MCP server does not pass the database parameter when initializing the Graphiti client with Neo4j, causing unexpected database behavior and user confusion.

Current Behavior

  1. Configuration Issue:

    • User configures NEO4J_DATABASE=graphiti in environment
    • MCP server reads this value into config but does not pass it to Graphiti constructor
    • Neo4jDriver defaults to database='neo4j' (hardcoded default)
  2. Runtime Behavior:

    • graphiti-core tries to switch databases when group_id != driver._database (line 698-700)
    • Calls driver.clone(database=group_id) to create new driver
    • Neo4jDriver does not implement clone() - inherits no-op base implementation
    • Database switching silently fails, continues using 'neo4j' database
    • Data saved with group_id property in 'neo4j' database (not 'graphiti')
  3. User Experience:

    • User expects data in 'graphiti' database (configured in env)
    • Neo4j Browser shows 'graphiti' database as empty
    • Data actually exists in 'neo4j' database with proper group_id filtering
    • Queries still work (property-based filtering) but confusing architecture

Root Causes

  1. Incomplete Implementation in graphiti-core:

    • Base GraphDriver.clone() returns self (no-op)
    • FalkorDriver implements clone() properly
    • Neo4jDriver does not implement clone()
    • Database switching only works for FalkorDB, not Neo4j
  2. Missing Parameter in MCP Server:

    • mcp_server/src/graphiti_mcp_server.py:233-240
    • Neo4j initialization does not pass database parameter
    • FalkorDB initialization correctly passes database parameter
  3. Architectural Mismatch:

    • Code comments suggest intent to use group_id as database name
    • Neo4j best practices recommend property-based multi-tenancy
    • Neo4j databases are heavyweight (not suitable for per-user isolation)

Architecture: Single database with property-based multi-tenancy

Design Principles

  1. ONE database named via configuration (default: 'graphiti')
  2. MULTIPLE users each with unique group_id
  3. Property-based isolation using WHERE n.group_id = 'user_id'
  4. Neo4j best practices for multi-tenant SaaS applications

Why This Approach?

  • Performance: Neo4j databases are heavyweight; property filtering is efficient
  • Operational: Simpler backup, monitoring, index management
  • Scalability: Proven pattern for multi-tenant Neo4j applications
  • Current State: Already working this way (by accident), just needs cleanup

Implementation Changes

File: mcp_server/src/graphiti_mcp_server.py

Location: Lines 233-240 (Neo4j initialization)

Current Code:

# For Neo4j (default), use the original approach
self.client = Graphiti(
    uri=db_config['uri'],
    user=db_config['user'],
    password=db_config['password'],
    llm_client=llm_client,
    embedder=embedder_client,
    max_coroutines=self.semaphore_limit,
    # ❌ MISSING: database parameter not passed!
)

Fixed Code:

# For Neo4j (default), use configured database with property-based multi-tenancy
database_name = (
    config.database.providers.neo4j.database
    if config.database.providers.neo4j
    else 'graphiti'
)

self.client = Graphiti(
    uri=db_config['uri'],
    user=db_config['user'],
    password=db_config['password'],
    llm_client=llm_client,
    embedder=embedder_client,
    max_coroutines=self.semaphore_limit,
    database=database_name,  # ✅ Pass configured database name
)

Why this works:

  • Sets driver._database = database_name (e.g., 'graphiti')
  • Prevents clone attempt at line 698: if 'lvarming73' != 'graphiti' → True, attempts clone
  • Clone returns same driver (no-op), continues using 'graphiti' database
  • Wait, this still has the problem! Let me reconsider...

Actually, we need a different approach:

The issue is graphiti-core's line 698-700 logic assumes group_id == database. For property-based multi-tenancy, we need to bypass this check.

Better Fix (requires graphiti-core understanding):

Since Neo4jDriver.clone() is a no-op, the current behavior is:

  1. Line 698: if group_id != driver._database → True (user_id != 'graphiti')
  2. Line 700: driver.clone(database=group_id) → Returns same driver
  3. Data saved with group_id property in current database

This actually works! The problem is just initialization. Let's fix it properly:

# For Neo4j (default), use configured database with property-based multi-tenancy
# Pass database parameter to ensure correct initial database selection
neo4j_database = (
    config.database.providers.neo4j.database
    if config.database.providers.neo4j
    else 'neo4j'
)

self.client = Graphiti(
    uri=db_config['uri'],
    user=db_config['user'],
    password=db_config['password'],
    llm_client=llm_client,
    embedder=embedder_client,
    max_coroutines=self.semaphore_limit,
    database=neo4j_database,  # ✅ Use configured database (from NEO4J_DATABASE env var)
)

Note: This ensures the driver starts with the correct database. The clone() call will be a no-op, but data will be in the right database from the start.

File: mcp_server/src/services/factories.py

Location: Lines 393-399

Current Code:

return {
    'uri': uri,
    'user': username,
    'password': password,
    # Note: database and use_parallel_runtime would need to be passed
    # to the driver after initialization if supported
}

Fixed Code:

return {
    'uri': uri,
    'user': username,
    'password': password,
    'database': neo4j_config.database,  # ✅ Include database in config
}

This ensures the database parameter is available in the config dictionary.

Testing Plan

  1. Unit Test: Verify database parameter is passed correctly
  2. Integration Test: Verify data saved to configured database
  3. Multi-User Test: Create episodes with different group_ids, verify isolation
  4. Query Test: Verify hybrid search respects group_id filtering

Cleanup Steps

Prerequisites

  • Backup current Neo4j data before any operations
  • Note current data location: neo4j database with group_id='lvarming73'

Step 1: Verify Current Data Location

// In Neo4j Browser
:use neo4j

// Count nodes by group_id
MATCH (n)
WHERE n.group_id IS NOT NULL
RETURN n.group_id, count(*) as node_count

// Verify data exists
MATCH (n:Entity {group_id: 'lvarming73'})
RETURN count(n) as entity_count

Step 2: Implement Code Fix

  1. Update mcp_server/src/services/factories.py (add database to config)
  2. Update mcp_server/src/graphiti_mcp_server.py (pass database parameter)
  3. Test with unit tests

Step 3: Create Target Database

// In Neo4j Browser or Neo4j Desktop
CREATE DATABASE graphiti

Step 4: Migrate Data (Option A - Manual Copy)

// Switch to source database
:use neo4j

// Export data to temporary storage (if needed)
MATCH (n) WHERE n.group_id IS NOT NULL
WITH collect(n) as nodes
// Copy to graphiti database using APOC or manual approach

Note: This requires APOC procedures or manual export/import. See Option B for easier approach.

Step 4: Migrate Data (Option B - Restart Fresh)

Recommended if data is test/development data:

  1. Stop MCP server
  2. Delete 'graphiti' database if exists: DROP DATABASE graphiti IF EXISTS
  3. Create fresh 'graphiti' database: CREATE DATABASE graphiti
  4. Deploy code fix
  5. Restart MCP server (will use 'graphiti' database)
  6. Let users re-add their data naturally

Step 5: Configuration Update

Verify environment configuration in LibreChat:

# In LibreChat MCP configuration
env:
  NEO4J_DATABASE: "graphiti"  # ✅ Already configured
  GRAPHITI_GROUP_ID: "lvarming73"  # User's group ID
  # ... other vars

Step 6: Verify Fix

// In Neo4j Browser
:use graphiti

// Verify data is in correct database
MATCH (n:Entity {group_id: 'lvarming73'})
RETURN count(n) as entity_count

// Check relationships
MATCH (n:Entity)-[r]->(m:Entity)
WHERE n.group_id = 'lvarming73'
RETURN count(r) as relationship_count

Step 7: Cleanup Old Database (Optional)

Only after confirming everything works:

// Delete data from old location
:use neo4j
MATCH (n) WHERE n.group_id = 'lvarming73'
DETACH DELETE n

Expected Outcomes

After Implementation

  1. Correct Database Usage:

    • MCP server uses database from NEO4J_DATABASE env var
    • Default: 'graphiti' (or 'neo4j' if not configured)
    • Data appears in expected location
  2. Multi-Tenant Architecture:

    • Single database shared across users
    • Each user has unique group_id
    • Property-based isolation via Cypher queries
    • Follows Neo4j best practices
  3. Operational Clarity:

    • Neo4j Browser shows data in expected database
    • Configuration matches runtime behavior
    • Easier to monitor and backup
  4. Code Consistency:

    • Neo4j initialization matches FalkorDB pattern
    • Database parameter explicitly passed
    • Clear architectural intent

References

Code Locations

  • Bug Location: mcp_server/src/graphiti_mcp_server.py:233-240
  • Factory Fix: mcp_server/src/services/factories.py:393-399
  • Neo4j Driver: graphiti_core/driver/neo4j_driver.py:34-47
  • Database Switching: graphiti_core/graphiti.py:698-700
  • Property Storage: graphiti_core/nodes.py:491
  • Query Pattern: graphiti_core/nodes.py:566-568
  • SEMAPHORE_LIMIT configuration (resolved - commit ba938c9)
  • Rate limiting with OpenAI Tier 1 (resolved via SEMAPHORE_LIMIT=3)
  • Database visibility confusion (this issue)

Neo4j Multi-Tenancy Resources

Implementation Checklist

  • Update factories.py to include database in config dict
  • Update graphiti_mcp_server.py to pass database parameter
  • Add unit test verifying database parameter is passed
  • Create 'graphiti' database in Neo4j
  • Migrate or recreate data in correct database
  • Verify queries work with correct database
  • Update documentation/README with correct architecture
  • Remove temporary test data from 'neo4j' database
  • Commit changes with descriptive message
  • Update Serena memory with architectural decisions

Notes

  • The graphiti-core library's database switching logic (lines 698-700) is partially implemented
  • FalkorDriver has full clone() implementation (multi-database isolation)
  • Neo4jDriver inherits no-op clone() (property-based isolation by default)
  • This "accidental" architecture is actually the correct Neo4j pattern
  • Fix makes the implicit behavior explicit and configurable