feat: Configure KuzuDB Docker deployment with persistent storage

- Update docker-compose-kuzu.yml to use persistent volume by default
- Set KUZU_DB to /data/graphiti.kuzu for persistent storage
- Increase max concurrent queries from 1 to 10 for better performance
- Add dedicated config-docker-kuzu.yaml for Docker deployments
- Create README-kuzu.md with usage instructions and troubleshooting
- Configure volume with local driver for data persistence

The setup now provides:
- Automatic data persistence across container restarts
- No external database dependencies
- Simple backup/restore procedures
- Lower resource usage compared to Neo4j

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Daniel Chalef 2025-08-30 08:43:17 -07:00
parent 2e345698e4
commit fb2ebeba50
3 changed files with 222 additions and 10 deletions

View file

@ -0,0 +1,90 @@
# Graphiti MCP Server Configuration for Docker with KuzuDB
# This configuration is optimized for running with docker-compose-kuzu.yml
# It uses persistent KuzuDB storage at /data/graphiti.kuzu
server:
transport: "sse" # SSE for HTTP access from Docker
host: "0.0.0.0"
port: 8000
llm:
provider: "openai" # Options: openai, azure_openai, anthropic, gemini, groq
model: "gpt-4o"
temperature: 0.0
max_tokens: 4096
providers:
openai:
api_key: ${OPENAI_API_KEY}
api_url: ${OPENAI_API_URL:https://api.openai.com/v1}
organization_id: ${OPENAI_ORGANIZATION_ID:}
azure_openai:
api_key: ${AZURE_OPENAI_API_KEY}
api_url: ${AZURE_OPENAI_ENDPOINT}
api_version: ${AZURE_OPENAI_API_VERSION:2024-10-21}
deployment_name: ${AZURE_OPENAI_DEPLOYMENT}
use_azure_ad: ${USE_AZURE_AD:false}
anthropic:
api_key: ${ANTHROPIC_API_KEY}
api_url: ${ANTHROPIC_API_URL:https://api.anthropic.com}
max_retries: 3
gemini:
api_key: ${GOOGLE_API_KEY}
project_id: ${GOOGLE_PROJECT_ID:}
location: ${GOOGLE_LOCATION:us-central1}
groq:
api_key: ${GROQ_API_KEY}
api_url: ${GROQ_API_URL:https://api.groq.com/openai/v1}
embedder:
provider: "openai" # Options: openai, azure_openai, gemini, voyage
model: "text-embedding-ada-002"
dimensions: 1536
providers:
openai:
api_key: ${OPENAI_API_KEY}
api_url: ${OPENAI_API_URL:https://api.openai.com/v1}
organization_id: ${OPENAI_ORGANIZATION_ID:}
azure_openai:
api_key: ${AZURE_OPENAI_API_KEY}
api_url: ${AZURE_OPENAI_EMBEDDINGS_ENDPOINT}
api_version: ${AZURE_OPENAI_API_VERSION:2024-10-21}
deployment_name: ${AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT}
use_azure_ad: ${USE_AZURE_AD:false}
gemini:
api_key: ${GOOGLE_API_KEY}
project_id: ${GOOGLE_PROJECT_ID:}
location: ${GOOGLE_LOCATION:us-central1}
voyage:
api_key: ${VOYAGE_API_KEY}
api_url: ${VOYAGE_API_URL:https://api.voyageai.com/v1}
model: "voyage-3"
database:
provider: "kuzu" # Using KuzuDB for this configuration
providers:
kuzu:
# Use environment variable if set, otherwise use persistent storage at /data
db: ${KUZU_DB:/data/graphiti.kuzu}
max_concurrent_queries: ${KUZU_MAX_CONCURRENT_QUERIES:10}
graphiti:
group_id: ${GRAPHITI_GROUP_ID:main}
episode_id_prefix: ${EPISODE_ID_PREFIX:}
user_id: ${USER_ID:mcp_user}
entity_types:
- name: "Requirement"
description: "Represents a requirement"
- name: "Preference"
description: "User preferences and settings"
- name: "Procedure"
description: "Standard operating procedures"

View file

@ -0,0 +1,122 @@
# Running MCP Server with KuzuDB
This guide explains how to run the Graphiti MCP server with KuzuDB as the graph database backend using Docker Compose.
## Why KuzuDB?
KuzuDB is an embedded graph database that provides several advantages:
- **No external dependencies**: Unlike Neo4j, KuzuDB runs embedded within the application
- **Persistent storage**: Data is stored in a single file/directory
- **High performance**: Optimized for analytical workloads
- **Low resource usage**: Minimal memory and CPU requirements
## Quick Start
1. **Set up environment variables**:
Create a `.env` file in the `docker` directory:
```bash
OPENAI_API_KEY=your-api-key-here
# Optional: Override default settings
GRAPHITI_GROUP_ID=my-group
KUZU_MAX_CONCURRENT_QUERIES=20
```
2. **Start the server**:
```bash
docker-compose -f docker-compose-kuzu.yml up
```
3. **Access the MCP server**:
The server will be available at `http://localhost:8000`
## Configuration
### Persistent Storage
By default, KuzuDB data is stored in a Docker volume at `/data/graphiti.kuzu`. This ensures your data persists across container restarts.
To use a different location, set the `KUZU_DB` environment variable:
```bash
KUZU_DB=/data/my-custom-db.kuzu
```
### In-Memory Mode
For testing or temporary usage, you can run KuzuDB in memory-only mode:
```bash
KUZU_DB=:memory:
```
Note: Data will be lost when the container stops.
### Performance Tuning
Adjust the maximum number of concurrent queries:
```bash
KUZU_MAX_CONCURRENT_QUERIES=20 # Default is 10
```
## Data Management
### Backup
To backup your KuzuDB data:
```bash
# Create a backup of the volume
docker run --rm -v mcp_server_kuzu_data:/data -v $(pwd):/backup alpine tar czf /backup/kuzu-backup.tar.gz -C /data .
```
### Restore
To restore from a backup:
```bash
# Restore from backup
docker run --rm -v mcp_server_kuzu_data:/data -v $(pwd):/backup alpine tar xzf /backup/kuzu-backup.tar.gz -C /data
```
### Clear Data
To completely clear the KuzuDB data:
```bash
# Stop the container
docker-compose -f docker-compose-kuzu.yml down
# Remove the volume
docker volume rm docker_kuzu_data
# Restart (will create fresh volume)
docker-compose -f docker-compose-kuzu.yml up
```
## Switching from Neo4j
If you're migrating from Neo4j to KuzuDB:
1. Export your data from Neo4j (if needed)
2. Stop the Neo4j-based setup: `docker-compose down`
3. Start the KuzuDB setup: `docker-compose -f docker-compose-kuzu.yml up`
4. Re-import your data through the MCP API
## Troubleshooting
### Container won't start
- Check that port 8000 is not in use: `lsof -i :8000`
- Verify your `.env` file has valid API keys
### Data not persisting
- Ensure the volume is properly mounted: `docker volume ls`
- Check volume permissions: `docker exec graphiti-mcp ls -la /data`
### Performance issues
- Increase `KUZU_MAX_CONCURRENT_QUERIES` for better parallelism
- Monitor container resources: `docker stats graphiti-mcp`
## Comparison with Neo4j Setup
| Feature | KuzuDB | Neo4j |
|---------|--------|-------|
| External Database | No | Yes |
| Memory Usage | Low (~100MB) | High (~512MB min) |
| Startup Time | Instant | 30+ seconds |
| Persistent Storage | Single file | Multiple files |
| Docker Services | 1 | 2 |
| Default Port | 8000 (MCP only) | 8000, 7474, 7687 |

View file

@ -8,9 +8,9 @@ services:
- path: .env
required: false # Makes the file optional. Default value is 'true'
environment:
# Database configuration for KuzuDB
- KUZU_DB=${KUZU_DB:-:memory:}
- KUZU_MAX_CONCURRENT_QUERIES=${KUZU_MAX_CONCURRENT_QUERIES:-1}
# Database configuration for KuzuDB - using persistent storage
- KUZU_DB=${KUZU_DB:-/data/graphiti.kuzu}
- KUZU_MAX_CONCURRENT_QUERIES=${KUZU_MAX_CONCURRENT_QUERIES:-10}
# LLM provider configurations
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
@ -29,14 +29,14 @@ services:
- CONFIG_PATH=/app/config/config.yaml
- PATH=/root/.local/bin:${PATH}
volumes:
- ./config.yaml:/app/config/config.yaml:ro
# Optional: If you want to persist KuzuDB data, uncomment the following line
# and change KUZU_DB to /data/kuzu.db
# - kuzu_data:/data
- ../config/config-docker-kuzu.yaml:/app/config/config.yaml:ro
# Persistent KuzuDB data storage
- kuzu_data:/data
ports:
- "8000:8000" # Expose the MCP server via HTTP for SSE transport
command: ["uv", "run", "graphiti_mcp_server.py", "--transport", "sse", "--config", "/app/config/config.yaml"]
# Optional: Volume for persistent KuzuDB storage
# volumes:
# kuzu_data:
# Volume for persistent KuzuDB storage
volumes:
kuzu_data:
driver: local