Lars Varming 341efd8c3d Fix: Critical database parameter bug + index creation error handling

CRITICAL FIX - Database Parameter (graphiti_core):
- Fixed graphiti_core/driver/neo4j_driver.py execute_query method
- database_ parameter was incorrectly added to params dict instead of kwargs
- Now correctly passed as keyword argument to Neo4j driver
- Impact: All queries now execute in configured database (not default 'neo4j')
- Root cause: Violated Neo4j Python driver API contract

Technical Details:
Previous code (BROKEN):
  params.setdefault('database_', self._database)  # Wrong - in params dict
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

Fixed code (CORRECT):
  kwargs.setdefault('database_', self._database)  # Correct - in kwargs
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

FIX - Index Creation Error Handling (MCP server):
- Added graceful handling for Neo4j IF NOT EXISTS bug
- Prevents MCP server crash when indices already exist
- Logs warning instead of failing initialization
- Handles EquivalentSchemaRuleAlreadyExists error gracefully

Files Modified:
- graphiti_core/driver/neo4j_driver.py (3 lines changed)
- mcp_server/src/graphiti_mcp_server.py (12 lines added error handling)
- mcp_server/pyproject.toml (version bump to 1.0.5)

Testing:
- Python syntax validation: PASSED
- Ruff formatting: PASSED
- Ruff linting: PASSED

Closes issues with:
- Data being stored in wrong Neo4j database
- MCP server crashing on startup with EquivalentSchemaRuleAlreadyExists
- NEO4J_DATABASE environment variable being ignored

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-10 11:37:16 +01:00

19 KiB

Raw Blame History

OpenAI-Compatible Custom Endpoint Support in Graphiti

Overview

This document analyzes how Graphiti handles OpenAI-compatible custom endpoints (like OpenRouter, NagaAI, Together.ai, etc.) and provides recommendations for improving support.

Current Architecture

Graphiti has three main OpenAI-compatible client implementations:

1. OpenAIClient (Default)

File: graphiti_core/llm_client/openai_client.py

Extends BaseOpenAIClient
Uses the new OpenAI Responses API (/v1/responses endpoint)
Uses client.responses.parse() for structured outputs (OpenAI SDK v1.91+)
This is the default client exported in the public API

response = await self.client.responses.parse(
    model=model,
    input=messages,
    temperature=temperature,
    max_output_tokens=max_tokens,
    text_format=response_model,
    reasoning={'effort': reasoning},
    text={'verbosity': verbosity},
)

2. OpenAIGenericClient (Legacy)

File: graphiti_core/llm_client/openai_generic_client.py

Uses the standard Chat Completions API (/v1/chat/completions)
Uses client.chat.completions.create()
Only supports unstructured JSON responses (not Pydantic schemas)
Currently not exported in __init__.py (hidden from public API)

response = await self.client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=temperature,
    max_tokens=max_tokens,
    response_format={'type': 'json_object'},
)

3. AzureOpenAILLMClient

File: graphiti_core/llm_client/azure_openai_client.py

Azure-specific implementation
Also uses responses.parse() like OpenAIClient
Handles Azure-specific authentication and endpoints

The Root Problem

Issue Description

When users configure Graphiti with custom OpenAI-compatible endpoints, they encounter errors because:

OpenAIClient uses the new /v1/responses endpoint via client.responses.parse()
- This is a new OpenAI API (introduced in OpenAI SDK v1.91.0) for structured outputs
- This endpoint is proprietary to OpenAI and not part of the standard OpenAI-compatible API specification
Most OpenAI-compatible services (OpenRouter, NagaAI, Ollama, Together.ai, etc.) only implement the standard /v1/chat/completions endpoint
- They do NOT implement /v1/responses
When you configure a base_url pointing to these services, Graphiti tries to call:
```
https://your-custom-endpoint.com/v1/responses
```
Instead of the expected:
```
https://your-custom-endpoint.com/v1/chat/completions
```

Example Error Scenario

from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIClient, LLMConfig

config = LLMConfig(
    api_key="sk-or-v1-...",
    model="meta-llama/llama-3-8b-instruct",
    base_url="https://openrouter.ai/api/v1"
)

llm_client = OpenAIClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)

# This will fail because OpenRouter doesn't have /v1/responses endpoint
# Error: 404 Not Found - https://openrouter.ai/api/v1/responses

Current Workaround (Documented)

The README documents using OpenAIGenericClient with Ollama:

from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig

llm_config = LLMConfig(
    api_key="ollama",
    model="deepseek-r1:7b",
    base_url="http://localhost:11434/v1"
)

llm_client = OpenAIGenericClient(config=llm_config)

Limitations of Current Workaround

OpenAIGenericClient doesn't support structured outputs with Pydantic models
It only returns raw JSON and manually validates schemas
It's not the recommended/default client
It's not exported in the public API (graphiti_core.llm_client)
Users must know to import from the internal module path

Recommended Solutions

Priority 1: Quick Wins (High Priority)

1.1 Export `OpenAIGenericClient` in Public API

File: graphiti_core/llm_client/__init__.py

Current:

from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient

__all__ = ['LLMClient', 'OpenAIClient', 'LLMConfig', 'RateLimitError']

Proposed:

from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient
from .openai_generic_client import OpenAIGenericClient

__all__ = ['LLMClient', 'OpenAIClient', 'OpenAIGenericClient', 'LLMConfig', 'RateLimitError']

1.2 Add Clear Documentation

File: README.md

Add a dedicated section:

### Using OpenAI-Compatible Endpoints (OpenRouter, NagaAI, Together.ai, etc.)

Most OpenAI-compatible services only support the standard Chat Completions API,
not OpenAI's newer Responses API. Use `OpenAIGenericClient` for these services:

**OpenRouter Example**:
```python
from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIGenericClient, LLMConfig

config = LLMConfig(
    api_key="sk-or-v1-...",
    model="meta-llama/llama-3-8b-instruct",
    base_url="https://openrouter.ai/api/v1"
)

llm_client = OpenAIGenericClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)

Together.ai Example:

config = LLMConfig(
    api_key="your-together-api-key",
    model="meta-llama/Llama-3-70b-chat-hf",
    base_url="https://api.together.xyz/v1"
)
llm_client = OpenAIGenericClient(config=config)

Note: OpenAIGenericClient has limited structured output support compared to the default OpenAIClient. It uses JSON mode instead of Pydantic schema validation.


#### 1.3 Add Better Error Messages

**File**: `graphiti_core/llm_client/openai_client.py`

Add error handling that detects the issue:

```python
async def _create_structured_completion(self, ...):
    try:
        response = await self.client.responses.parse(...)
        return response
    except openai.NotFoundError as e:
        if self.config.base_url and "api.openai.com" not in self.config.base_url:
            raise Exception(
                f"The OpenAI Responses API (/v1/responses) is not available at {self.config.base_url}. "
                f"Most OpenAI-compatible services only support /v1/chat/completions. "
                f"Please use OpenAIGenericClient instead of OpenAIClient for custom endpoints. "
                f"See: https://help.getzep.com/graphiti/guides/custom-endpoints"
            ) from e
        raise

Priority 2: Better UX (Medium Priority)

2.1 Add Auto-Detection Logic

File: graphiti_core/llm_client/config.py

class LLMConfig:
    def __init__(
        self,
        api_key: str | None = None,
        model: str | None = None,
        base_url: str | None = None,
        temperature: float = DEFAULT_TEMPERATURE,
        max_tokens: int = DEFAULT_MAX_TOKENS,
        small_model: str | None = None,
        use_responses_api: bool | None = None,  # NEW: Auto-detect if None
    ):
        self.base_url = base_url
        self.api_key = api_key
        self.model = model
        self.small_model = small_model
        self.temperature = temperature
        self.max_tokens = max_tokens
        
        # Auto-detect API style based on base_url
        if use_responses_api is None:
            self.use_responses_api = self._should_use_responses_api()
        else:
            self.use_responses_api = use_responses_api
    
    def _should_use_responses_api(self) -> bool:
        """Determine if we should use the Responses API based on base_url."""
        if self.base_url is None:
            return True  # Default OpenAI
        
        # Known services that support Responses API
        supported_services = ["api.openai.com", "azure.com"]
        return any(service in self.base_url for service in supported_services)

2.2 Create a Unified Smart Client

Option A: Modify OpenAIClient to Fall Back

class OpenAIClient(BaseOpenAIClient):
    def __init__(self, config: LLMConfig | None = None, ...):
        super().__init__(config, ...)
        if config is None:
            config = LLMConfig()
        
        self.use_responses_api = config.use_responses_api
        self.client = AsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
    
    async def _create_structured_completion(self, ...):
        if self.use_responses_api:
            # Use responses.parse() for OpenAI native
            return await self.client.responses.parse(...)
        else:
            # Fall back to chat.completions with JSON schema for compatibility
            return await self.client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
                max_tokens=max_tokens,
                response_format={
                    "type": "json_schema",
                    "json_schema": {
                        "name": response_model.__name__,
                        "schema": response_model.model_json_schema(),
                        "strict": False
                    }
                }
            )

Option B: Create a Factory Function

# graphiti_core/llm_client/__init__.py

def create_openai_client(
    config: LLMConfig | None = None,
    cache: bool = False,
    **kwargs
) -> LLMClient:
    """
    Factory to create the appropriate OpenAI-compatible client.
    
    Automatically selects between OpenAIClient (for native OpenAI)
    and OpenAIGenericClient (for OpenAI-compatible services).
    
    Args:
        config: LLM configuration including base_url
        cache: Whether to enable caching
        **kwargs: Additional arguments passed to the client
    
    Returns:
        LLMClient: Either OpenAIClient or OpenAIGenericClient
    
    Example:
        >>> # Automatically uses OpenAIGenericClient for OpenRouter
        >>> config = LLMConfig(
        ...     api_key="sk-or-v1-...",
        ...     model="meta-llama/llama-3-8b-instruct",
        ...     base_url="https://openrouter.ai/api/v1"
        ... )
        >>> client = create_openai_client(config)
    """
    if config is None:
        config = LLMConfig()
    
    # Auto-detect based on base_url
    if config.base_url is None or "api.openai.com" in config.base_url:
        return OpenAIClient(config, cache, **kwargs)
    else:
        return OpenAIGenericClient(config, cache, **kwargs)

2.3 Enhance `OpenAIGenericClient` with Better Structured Output Support

File: graphiti_core/llm_client/openai_generic_client.py

async def _generate_response(
    self,
    messages: list[Message],
    response_model: type[BaseModel] | None = None,
    max_tokens: int = DEFAULT_MAX_TOKENS,
    model_size: ModelSize = ModelSize.medium,
) -> dict[str, typing.Any]:
    openai_messages: list[ChatCompletionMessageParam] = []
    for m in messages:
        m.content = self._clean_input(m.content)
        if m.role == 'user':
            openai_messages.append({'role': 'user', 'content': m.content})
        elif m.role == 'system':
            openai_messages.append({'role': 'system', 'content': m.content})

    try:
        # Try to use json_schema format (supported by more providers)
        if response_model:
            response = await self.client.chat.completions.create(
                model=self.model or DEFAULT_MODEL,
                messages=openai_messages,
                temperature=self.temperature,
                max_tokens=max_tokens or self.max_tokens,
                response_format={
                    "type": "json_schema",
                    "json_schema": {
                        "name": response_model.__name__,
                        "schema": response_model.model_json_schema(),
                        "strict": False  # Most providers don't support strict mode
                    }
                }
            )
        else:
            response = await self.client.chat.completions.create(
                model=self.model or DEFAULT_MODEL,
                messages=openai_messages,
                temperature=self.temperature,
                max_tokens=max_tokens or self.max_tokens,
                response_format={'type': 'json_object'},
            )
        
        result = response.choices[0].message.content or '{}'
        return json.loads(result)
    except Exception as e:
        logger.error(f'Error in generating LLM response: {e}')
        raise

Priority 3: Nice to Have (Low Priority)

3.1 Provider-Specific Clients

Create convenience clients for popular providers:

# graphiti_core/llm_client/openrouter_client.py
class OpenRouterClient(OpenAIGenericClient):
    """Pre-configured client for OpenRouter.
    
    Example:
        >>> client = OpenRouterClient(
        ...     api_key="sk-or-v1-...",
        ...     model="meta-llama/llama-3-8b-instruct"
        ... )
    """
    def __init__(
        self,
        api_key: str,
        model: str,
        temperature: float = DEFAULT_TEMPERATURE,
        max_tokens: int = DEFAULT_MAX_TOKENS,
        **kwargs
    ):
        config = LLMConfig(
            api_key=api_key,
            model=model,
            base_url="https://openrouter.ai/api/v1",
            temperature=temperature,
            max_tokens=max_tokens
        )
        super().__init__(config=config, **kwargs)

# graphiti_core/llm_client/together_client.py
class TogetherClient(OpenAIGenericClient):
    """Pre-configured client for Together.ai.
    
    Example:
        >>> client = TogetherClient(
        ...     api_key="your-together-key",
        ...     model="meta-llama/Llama-3-70b-chat-hf"
        ... )
    """
    def __init__(
        self,
        api_key: str,
        model: str,
        temperature: float = DEFAULT_TEMPERATURE,
        max_tokens: int = DEFAULT_MAX_TOKENS,
        **kwargs
    ):
        config = LLMConfig(
            api_key=api_key,
            model=model,
            base_url="https://api.together.xyz/v1",
            temperature=temperature,
            max_tokens=max_tokens
        )
        super().__init__(config=config, **kwargs)

3.2 Provider Compatibility Matrix

Add to documentation:

Provider	Standard Client	Generic Client	Structured Outputs	Notes
OpenAI	✅ `OpenAIClient`	✅	✅ Full (Responses API)	Recommended: Use `OpenAIClient`
Azure OpenAI	✅ `AzureOpenAILLMClient`	✅	✅ Full (Responses API)	Requires API version 2024-08-01-preview+
OpenRouter	❌	✅ `OpenAIGenericClient`	⚠️ Limited (JSON Schema)	Use `OpenAIGenericClient`
Together.ai	❌	✅ `OpenAIGenericClient`	⚠️ Limited (JSON Schema)	Use `OpenAIGenericClient`
Ollama	❌	✅ `OpenAIGenericClient`	⚠️ Limited (JSON mode)	Local deployment
Groq	❌	✅ `OpenAIGenericClient`	⚠️ Limited (JSON Schema)	Very fast inference
Perplexity	❌	✅ `OpenAIGenericClient`	⚠️ Limited (JSON mode)	Primarily for search

Testing Recommendations

Unit Tests

Endpoint detection logic

def test_should_use_responses_api():
    # OpenAI native should use Responses API
    config = LLMConfig(base_url="https://api.openai.com/v1")
    assert config.use_responses_api is True

    # Custom endpoints should not
    config = LLMConfig(base_url="https://openrouter.ai/api/v1")
    assert config.use_responses_api is False

Client selection

def test_create_openai_client_auto_selection():
    # Should return OpenAIClient for OpenAI
    config = LLMConfig(api_key="test")
    client = create_openai_client(config)
    assert isinstance(client, OpenAIClient)

    # Should return OpenAIGenericClient for others
    config = LLMConfig(api_key="test", base_url="https://openrouter.ai/api/v1")
    client = create_openai_client(config)
    assert isinstance(client, OpenAIGenericClient)

Integration Tests

Mock server tests with responses for both endpoints
Real provider tests (optional, may require API keys):
- OpenRouter
- Together.ai
- Ollama (local)

Manual Testing Checklist

OpenRouter with Llama models
Together.ai with various models
Ollama with local models
Groq with fast models
Verify error messages are helpful
Test both structured and unstructured outputs

Summary of Issues

Issue	Current State	Impact	Priority
`/v1/responses` endpoint usage	Used by default `OpenAIClient`	BREAKS all non-OpenAI providers	High
`OpenAIGenericClient` not exported	Hidden from public API	Users can't easily use it	High
Poor error messages	Generic 404 errors	Confusing for users	High
No auto-detection	Must manually choose client	Poor DX	Medium
Limited docs	Only Ollama example	Users don't know how to configure other providers	High
No structured output in Generic client	Only supports loose JSON	Reduced type safety for custom endpoints	Medium
No provider-specific helpers	Generic configuration only	More setup required	Low

Implementation Roadmap

Phase 1: Quick Fixes (1-2 days)

Export OpenAIGenericClient in public API
Add documentation section for custom endpoints
Improve error messages in OpenAIClient
Add examples for OpenRouter, Together.ai

Phase 2: Enhanced Support (3-5 days)

Add auto-detection logic to LLMConfig
Create factory function for client selection
Enhance OpenAIGenericClient with better JSON schema support
Add comprehensive tests

Phase 3: Polish (2-3 days)

Create provider-specific client classes
Build compatibility matrix documentation
Add integration tests with real providers
Update all examples and guides

References

OpenAI SDK v1.91.0+ Responses API: https://platform.openai.com/docs/api-reference/responses
OpenAI Chat Completions API: https://platform.openai.com/docs/api-reference/chat
OpenRouter API: https://openrouter.ai/docs
Together.ai API: https://docs.together.ai/docs/openai-api-compatibility
Ollama OpenAI compatibility: https://github.com/ollama/ollama/blob/main/docs/openai.md

Contributing

If you're implementing these changes, please ensure:

All changes follow the repository guidelines in AGENTS.md
Run make format before committing
Run make lint and make test to verify changes
Update documentation for any new public APIs
Add examples demonstrating the new functionality

Questions or Issues?

Open an issue: https://github.com/getzep/graphiti/issues
Discussion: https://github.com/getzep/graphiti/discussions
Documentation: https://help.getzep.com/graphiti

19 KiB Raw Blame History