# BACKLOG: Neo4j Database Configuration Fix **Status:** Ready for Implementation **Priority:** Medium **Type:** Bug Fix + Architecture Improvement **Date:** 2025-11-09 ## Problem Statement The MCP server does not pass the `database` parameter when initializing the Graphiti client with Neo4j, causing unexpected database behavior and user confusion. ### Current Behavior 1. **Configuration Issue:** - User configures `NEO4J_DATABASE=graphiti` in environment - MCP server reads this value into config but **does not pass it** to Graphiti constructor - Neo4jDriver defaults to `database='neo4j'` (hardcoded default) 2. **Runtime Behavior:** - graphiti-core tries to switch databases when `group_id != driver._database` (line 698-700) - Calls `driver.clone(database=group_id)` to create new driver - **Neo4jDriver does not implement clone()** - inherits no-op base implementation - Database switching silently fails, continues using 'neo4j' database - Data saved with `group_id` property in 'neo4j' database (not 'graphiti') 3. **User Experience:** - User expects data in 'graphiti' database (configured in env) - Neo4j Browser shows 'graphiti' database as empty - Data actually exists in 'neo4j' database with proper group_id filtering - Queries still work (property-based filtering) but confusing architecture ### Root Causes 1. **Incomplete Implementation in graphiti-core:** - Base `GraphDriver.clone()` returns `self` (no-op) - `FalkorDriver` implements clone() properly - `Neo4jDriver` does not implement clone() - Database switching only works for FalkorDB, not Neo4j 2. **Missing Parameter in MCP Server:** - `mcp_server/src/graphiti_mcp_server.py:233-240` - Neo4j initialization does not pass `database` parameter - FalkorDB initialization correctly passes `database` parameter 3. **Architectural Mismatch:** - Code comments suggest intent to use `group_id` as database name - Neo4j best practices recommend property-based multi-tenancy - Neo4j databases are heavyweight (not suitable for per-user isolation) ## Solution: Option 2 (Recommended) **Architecture:** Single database with property-based multi-tenancy ### Design Principles 1. **ONE database** named via configuration (default: 'graphiti') 2. **MULTIPLE users** each with unique `group_id` 3. **Property-based isolation** using `WHERE n.group_id = 'user_id'` 4. **Neo4j best practices** for multi-tenant SaaS applications ### Why This Approach? - **Performance:** Neo4j databases are heavyweight; property filtering is efficient - **Operational:** Simpler backup, monitoring, index management - **Scalability:** Proven pattern for multi-tenant Neo4j applications - **Current State:** Already working this way (by accident), just needs cleanup ### Implementation Changes #### File: `mcp_server/src/graphiti_mcp_server.py` **Location:** Lines 233-240 (Neo4j initialization) **Current Code:** ```python # For Neo4j (default), use the original approach self.client = Graphiti( uri=db_config['uri'], user=db_config['user'], password=db_config['password'], llm_client=llm_client, embedder=embedder_client, max_coroutines=self.semaphore_limit, # ❌ MISSING: database parameter not passed! ) ``` **Fixed Code:** ```python # For Neo4j (default), use configured database with property-based multi-tenancy database_name = ( config.database.providers.neo4j.database if config.database.providers.neo4j else 'graphiti' ) self.client = Graphiti( uri=db_config['uri'], user=db_config['user'], password=db_config['password'], llm_client=llm_client, embedder=embedder_client, max_coroutines=self.semaphore_limit, database=database_name, # ✅ Pass configured database name ) ``` **Why this works:** - Sets `driver._database = database_name` (e.g., 'graphiti') - Prevents clone attempt at line 698: `if 'lvarming73' != 'graphiti'` → True, attempts clone - Clone returns same driver (no-op), continues using 'graphiti' database - **Wait, this still has the problem!** Let me reconsider... **Actually, we need a different approach:** The issue is graphiti-core's line 698-700 logic assumes `group_id == database`. For property-based multi-tenancy, we need to bypass this check. **Better Fix (requires graphiti-core understanding):** Since Neo4jDriver.clone() is a no-op, the current behavior is: 1. Line 698: `if group_id != driver._database` → True (user_id != 'graphiti') 2. Line 700: `driver.clone(database=group_id)` → Returns same driver 3. Data saved with `group_id` property in current database **This actually works!** The problem is just initialization. Let's fix it properly: ```python # For Neo4j (default), use configured database with property-based multi-tenancy # Pass database parameter to ensure correct initial database selection neo4j_database = ( config.database.providers.neo4j.database if config.database.providers.neo4j else 'neo4j' ) self.client = Graphiti( uri=db_config['uri'], user=db_config['user'], password=db_config['password'], llm_client=llm_client, embedder=embedder_client, max_coroutines=self.semaphore_limit, database=neo4j_database, # ✅ Use configured database (from NEO4J_DATABASE env var) ) ``` **Note:** This ensures the driver starts with the correct database. The clone() call will be a no-op, but data will be in the right database from the start. #### File: `mcp_server/src/services/factories.py` **Location:** Lines 393-399 **Current Code:** ```python return { 'uri': uri, 'user': username, 'password': password, # Note: database and use_parallel_runtime would need to be passed # to the driver after initialization if supported } ``` **Fixed Code:** ```python return { 'uri': uri, 'user': username, 'password': password, 'database': neo4j_config.database, # ✅ Include database in config } ``` This ensures the database parameter is available in the config dictionary. ### Testing Plan 1. **Unit Test:** Verify database parameter is passed correctly 2. **Integration Test:** Verify data saved to configured database 3. **Multi-User Test:** Create episodes with different group_ids, verify isolation 4. **Query Test:** Verify hybrid search respects group_id filtering ## Cleanup Steps ### Prerequisites - Backup current Neo4j data before any operations - Note current data location: `neo4j` database with `group_id='lvarming73'` ### Step 1: Verify Current Data Location ```cypher // In Neo4j Browser :use neo4j // Count nodes by group_id MATCH (n) WHERE n.group_id IS NOT NULL RETURN n.group_id, count(*) as node_count // Verify data exists MATCH (n:Entity {group_id: 'lvarming73'}) RETURN count(n) as entity_count ``` ### Step 2: Implement Code Fix 1. Update `mcp_server/src/services/factories.py` (add database to config) 2. Update `mcp_server/src/graphiti_mcp_server.py` (pass database parameter) 3. Test with unit tests ### Step 3: Create Target Database ```cypher // In Neo4j Browser or Neo4j Desktop CREATE DATABASE graphiti ``` ### Step 4: Migrate Data (Option A - Manual Copy) ```cypher // Switch to source database :use neo4j // Export data to temporary storage (if needed) MATCH (n) WHERE n.group_id IS NOT NULL WITH collect(n) as nodes // Copy to graphiti database using APOC or manual approach ``` **Note:** This requires APOC procedures or manual export/import. See Option B for easier approach. ### Step 4: Migrate Data (Option B - Restart Fresh) **Recommended if data is test/development data:** 1. Stop MCP server 2. Delete 'graphiti' database if exists: `DROP DATABASE graphiti IF EXISTS` 3. Create fresh 'graphiti' database: `CREATE DATABASE graphiti` 4. Deploy code fix 5. Restart MCP server (will use 'graphiti' database) 6. Let users re-add their data naturally ### Step 5: Configuration Update Verify environment configuration in LibreChat: ```yaml # In LibreChat MCP configuration env: NEO4J_DATABASE: "graphiti" # ✅ Already configured GRAPHITI_GROUP_ID: "lvarming73" # User's group ID # ... other vars ``` ### Step 6: Verify Fix ```cypher // In Neo4j Browser :use graphiti // Verify data is in correct database MATCH (n:Entity {group_id: 'lvarming73'}) RETURN count(n) as entity_count // Check relationships MATCH (n:Entity)-[r]->(m:Entity) WHERE n.group_id = 'lvarming73' RETURN count(r) as relationship_count ``` ### Step 7: Cleanup Old Database (Optional) **Only after confirming everything works:** ```cypher // Delete data from old location :use neo4j MATCH (n) WHERE n.group_id = 'lvarming73' DETACH DELETE n ``` ## Expected Outcomes ### After Implementation 1. **Correct Database Usage:** - MCP server uses database from `NEO4J_DATABASE` env var - Default: 'graphiti' (or 'neo4j' if not configured) - Data appears in expected location 2. **Multi-Tenant Architecture:** - Single database shared across users - Each user has unique `group_id` - Property-based isolation via Cypher queries - Follows Neo4j best practices 3. **Operational Clarity:** - Neo4j Browser shows data in expected database - Configuration matches runtime behavior - Easier to monitor and backup 4. **Code Consistency:** - Neo4j initialization matches FalkorDB pattern - Database parameter explicitly passed - Clear architectural intent ## References ### Code Locations - **Bug Location:** `mcp_server/src/graphiti_mcp_server.py:233-240` - **Factory Fix:** `mcp_server/src/services/factories.py:393-399` - **Neo4j Driver:** `graphiti_core/driver/neo4j_driver.py:34-47` - **Database Switching:** `graphiti_core/graphiti.py:698-700` - **Property Storage:** `graphiti_core/nodes.py:491` - **Query Pattern:** `graphiti_core/nodes.py:566-568` ### Related Issues - SEMAPHORE_LIMIT configuration (resolved - commit ba938c9) - Rate limiting with OpenAI Tier 1 (resolved via SEMAPHORE_LIMIT=3) - Database visibility confusion (this issue) ### Neo4j Multi-Tenancy Resources - [Neo4j Multi-Tenancy Guide](https://neo4j.com/developer/multi-tenancy-worked-example/) - [Property-based isolation](https://neo4j.com/docs/operations-manual/current/database-administration/multi-tenancy/) - FalkorDB uses Redis databases (lightweight, per-user databases make sense) - Neo4j databases are heavyweight (property-based filtering recommended) ## Implementation Checklist - [ ] Update `factories.py` to include database in config dict - [ ] Update `graphiti_mcp_server.py` to pass database parameter - [ ] Add unit test verifying database parameter is passed - [ ] Create 'graphiti' database in Neo4j - [ ] Migrate or recreate data in correct database - [ ] Verify queries work with correct database - [ ] Update documentation/README with correct architecture - [ ] Remove temporary test data from 'neo4j' database - [ ] Commit changes with descriptive message - [ ] Update Serena memory with architectural decisions ## Notes - The graphiti-core library's database switching logic (lines 698-700) is partially implemented - FalkorDriver has full clone() implementation (multi-database isolation) - Neo4jDriver inherits no-op clone() (property-based isolation by default) - This "accidental" architecture is actually the correct Neo4j pattern - Fix makes the implicit behavior explicit and configurable