graphiti/DOCS/BACKLOG-OpenAI-Compatible-Endpoints.md
Lars Varming 341efd8c3d Fix: Critical database parameter bug + index creation error handling
CRITICAL FIX - Database Parameter (graphiti_core):
- Fixed graphiti_core/driver/neo4j_driver.py execute_query method
- database_ parameter was incorrectly added to params dict instead of kwargs
- Now correctly passed as keyword argument to Neo4j driver
- Impact: All queries now execute in configured database (not default 'neo4j')
- Root cause: Violated Neo4j Python driver API contract

Technical Details:
Previous code (BROKEN):
  params.setdefault('database_', self._database)  # Wrong - in params dict
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

Fixed code (CORRECT):
  kwargs.setdefault('database_', self._database)  # Correct - in kwargs
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

FIX - Index Creation Error Handling (MCP server):
- Added graceful handling for Neo4j IF NOT EXISTS bug
- Prevents MCP server crash when indices already exist
- Logs warning instead of failing initialization
- Handles EquivalentSchemaRuleAlreadyExists error gracefully

Files Modified:
- graphiti_core/driver/neo4j_driver.py (3 lines changed)
- mcp_server/src/graphiti_mcp_server.py (12 lines added error handling)
- mcp_server/pyproject.toml (version bump to 1.0.5)

Testing:
- Python syntax validation: PASSED
- Ruff formatting: PASSED
- Ruff linting: PASSED

Closes issues with:
- Data being stored in wrong Neo4j database
- MCP server crashing on startup with EquivalentSchemaRuleAlreadyExists
- NEO4J_DATABASE environment variable being ignored

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 11:37:16 +01:00

19 KiB

OpenAI-Compatible Custom Endpoint Support in Graphiti

Overview

This document analyzes how Graphiti handles OpenAI-compatible custom endpoints (like OpenRouter, NagaAI, Together.ai, etc.) and provides recommendations for improving support.

Current Architecture

Graphiti has three main OpenAI-compatible client implementations:

1. OpenAIClient (Default)

File: graphiti_core/llm_client/openai_client.py

  • Extends BaseOpenAIClient
  • Uses the new OpenAI Responses API (/v1/responses endpoint)
  • Uses client.responses.parse() for structured outputs (OpenAI SDK v1.91+)
  • This is the default client exported in the public API
response = await self.client.responses.parse(
    model=model,
    input=messages,
    temperature=temperature,
    max_output_tokens=max_tokens,
    text_format=response_model,
    reasoning={'effort': reasoning},
    text={'verbosity': verbosity},
)

2. OpenAIGenericClient (Legacy)

File: graphiti_core/llm_client/openai_generic_client.py

  • Uses the standard Chat Completions API (/v1/chat/completions)
  • Uses client.chat.completions.create()
  • Only supports unstructured JSON responses (not Pydantic schemas)
  • Currently not exported in __init__.py (hidden from public API)
response = await self.client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=temperature,
    max_tokens=max_tokens,
    response_format={'type': 'json_object'},
)

3. AzureOpenAILLMClient

File: graphiti_core/llm_client/azure_openai_client.py

  • Azure-specific implementation
  • Also uses responses.parse() like OpenAIClient
  • Handles Azure-specific authentication and endpoints

The Root Problem

Issue Description

When users configure Graphiti with custom OpenAI-compatible endpoints, they encounter errors because:

  1. OpenAIClient uses the new /v1/responses endpoint via client.responses.parse()

    • This is a new OpenAI API (introduced in OpenAI SDK v1.91.0) for structured outputs
    • This endpoint is proprietary to OpenAI and not part of the standard OpenAI-compatible API specification
  2. Most OpenAI-compatible services (OpenRouter, NagaAI, Ollama, Together.ai, etc.) only implement the standard /v1/chat/completions endpoint

    • They do NOT implement /v1/responses
  3. When you configure a base_url pointing to these services, Graphiti tries to call:

    https://your-custom-endpoint.com/v1/responses
    

    Instead of the expected:

    https://your-custom-endpoint.com/v1/chat/completions
    

Example Error Scenario

from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIClient, LLMConfig

config = LLMConfig(
    api_key="sk-or-v1-...",
    model="meta-llama/llama-3-8b-instruct",
    base_url="https://openrouter.ai/api/v1"
)

llm_client = OpenAIClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)

# This will fail because OpenRouter doesn't have /v1/responses endpoint
# Error: 404 Not Found - https://openrouter.ai/api/v1/responses

Current Workaround (Documented)

The README documents using OpenAIGenericClient with Ollama:

from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig

llm_config = LLMConfig(
    api_key="ollama",
    model="deepseek-r1:7b",
    base_url="http://localhost:11434/v1"
)

llm_client = OpenAIGenericClient(config=llm_config)

Limitations of Current Workaround

  • OpenAIGenericClient doesn't support structured outputs with Pydantic models
  • It only returns raw JSON and manually validates schemas
  • It's not the recommended/default client
  • It's not exported in the public API (graphiti_core.llm_client)
  • Users must know to import from the internal module path

Priority 1: Quick Wins (High Priority)

1.1 Export OpenAIGenericClient in Public API

File: graphiti_core/llm_client/__init__.py

Current:

from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient

__all__ = ['LLMClient', 'OpenAIClient', 'LLMConfig', 'RateLimitError']

Proposed:

from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient
from .openai_generic_client import OpenAIGenericClient

__all__ = ['LLMClient', 'OpenAIClient', 'OpenAIGenericClient', 'LLMConfig', 'RateLimitError']

1.2 Add Clear Documentation

File: README.md

Add a dedicated section:

### Using OpenAI-Compatible Endpoints (OpenRouter, NagaAI, Together.ai, etc.)

Most OpenAI-compatible services only support the standard Chat Completions API,
not OpenAI's newer Responses API. Use `OpenAIGenericClient` for these services:

**OpenRouter Example**:
```python
from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIGenericClient, LLMConfig

config = LLMConfig(
    api_key="sk-or-v1-...",
    model="meta-llama/llama-3-8b-instruct",
    base_url="https://openrouter.ai/api/v1"
)

llm_client = OpenAIGenericClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)

Together.ai Example:

config = LLMConfig(
    api_key="your-together-api-key",
    model="meta-llama/Llama-3-70b-chat-hf",
    base_url="https://api.together.xyz/v1"
)
llm_client = OpenAIGenericClient(config=config)

Note: OpenAIGenericClient has limited structured output support compared to the default OpenAIClient. It uses JSON mode instead of Pydantic schema validation.


#### 1.3 Add Better Error Messages

**File**: `graphiti_core/llm_client/openai_client.py`

Add error handling that detects the issue:

```python
async def _create_structured_completion(self, ...):
    try:
        response = await self.client.responses.parse(...)
        return response
    except openai.NotFoundError as e:
        if self.config.base_url and "api.openai.com" not in self.config.base_url:
            raise Exception(
                f"The OpenAI Responses API (/v1/responses) is not available at {self.config.base_url}. "
                f"Most OpenAI-compatible services only support /v1/chat/completions. "
                f"Please use OpenAIGenericClient instead of OpenAIClient for custom endpoints. "
                f"See: https://help.getzep.com/graphiti/guides/custom-endpoints"
            ) from e
        raise

Priority 2: Better UX (Medium Priority)

2.1 Add Auto-Detection Logic

File: graphiti_core/llm_client/config.py

class LLMConfig:
    def __init__(
        self,
        api_key: str | None = None,
        model: str | None = None,
        base_url: str | None = None,
        temperature: float = DEFAULT_TEMPERATURE,
        max_tokens: int = DEFAULT_MAX_TOKENS,
        small_model: str | None = None,
        use_responses_api: bool | None = None,  # NEW: Auto-detect if None
    ):
        self.base_url = base_url
        self.api_key = api_key
        self.model = model
        self.small_model = small_model
        self.temperature = temperature
        self.max_tokens = max_tokens
        
        # Auto-detect API style based on base_url
        if use_responses_api is None:
            self.use_responses_api = self._should_use_responses_api()
        else:
            self.use_responses_api = use_responses_api
    
    def _should_use_responses_api(self) -> bool:
        """Determine if we should use the Responses API based on base_url."""
        if self.base_url is None:
            return True  # Default OpenAI
        
        # Known services that support Responses API
        supported_services = ["api.openai.com", "azure.com"]
        return any(service in self.base_url for service in supported_services)

2.2 Create a Unified Smart Client

Option A: Modify OpenAIClient to Fall Back

class OpenAIClient(BaseOpenAIClient):
    def __init__(self, config: LLMConfig | None = None, ...):
        super().__init__(config, ...)
        if config is None:
            config = LLMConfig()
        
        self.use_responses_api = config.use_responses_api
        self.client = AsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
    
    async def _create_structured_completion(self, ...):
        if self.use_responses_api:
            # Use responses.parse() for OpenAI native
            return await self.client.responses.parse(...)
        else:
            # Fall back to chat.completions with JSON schema for compatibility
            return await self.client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
                max_tokens=max_tokens,
                response_format={
                    "type": "json_schema",
                    "json_schema": {
                        "name": response_model.__name__,
                        "schema": response_model.model_json_schema(),
                        "strict": False
                    }
                }
            )

Option B: Create a Factory Function

# graphiti_core/llm_client/__init__.py

def create_openai_client(
    config: LLMConfig | None = None,
    cache: bool = False,
    **kwargs
) -> LLMClient:
    """
    Factory to create the appropriate OpenAI-compatible client.
    
    Automatically selects between OpenAIClient (for native OpenAI)
    and OpenAIGenericClient (for OpenAI-compatible services).
    
    Args:
        config: LLM configuration including base_url
        cache: Whether to enable caching
        **kwargs: Additional arguments passed to the client
    
    Returns:
        LLMClient: Either OpenAIClient or OpenAIGenericClient
    
    Example:
        >>> # Automatically uses OpenAIGenericClient for OpenRouter
        >>> config = LLMConfig(
        ...     api_key="sk-or-v1-...",
        ...     model="meta-llama/llama-3-8b-instruct",
        ...     base_url="https://openrouter.ai/api/v1"
        ... )
        >>> client = create_openai_client(config)
    """
    if config is None:
        config = LLMConfig()
    
    # Auto-detect based on base_url
    if config.base_url is None or "api.openai.com" in config.base_url:
        return OpenAIClient(config, cache, **kwargs)
    else:
        return OpenAIGenericClient(config, cache, **kwargs)

2.3 Enhance OpenAIGenericClient with Better Structured Output Support

File: graphiti_core/llm_client/openai_generic_client.py

async def _generate_response(
    self,
    messages: list[Message],
    response_model: type[BaseModel] | None = None,
    max_tokens: int = DEFAULT_MAX_TOKENS,
    model_size: ModelSize = ModelSize.medium,
) -> dict[str, typing.Any]:
    openai_messages: list[ChatCompletionMessageParam] = []
    for m in messages:
        m.content = self._clean_input(m.content)
        if m.role == 'user':
            openai_messages.append({'role': 'user', 'content': m.content})
        elif m.role == 'system':
            openai_messages.append({'role': 'system', 'content': m.content})

    try:
        # Try to use json_schema format (supported by more providers)
        if response_model:
            response = await self.client.chat.completions.create(
                model=self.model or DEFAULT_MODEL,
                messages=openai_messages,
                temperature=self.temperature,
                max_tokens=max_tokens or self.max_tokens,
                response_format={
                    "type": "json_schema",
                    "json_schema": {
                        "name": response_model.__name__,
                        "schema": response_model.model_json_schema(),
                        "strict": False  # Most providers don't support strict mode
                    }
                }
            )
        else:
            response = await self.client.chat.completions.create(
                model=self.model or DEFAULT_MODEL,
                messages=openai_messages,
                temperature=self.temperature,
                max_tokens=max_tokens or self.max_tokens,
                response_format={'type': 'json_object'},
            )
        
        result = response.choices[0].message.content or '{}'
        return json.loads(result)
    except Exception as e:
        logger.error(f'Error in generating LLM response: {e}')
        raise

Priority 3: Nice to Have (Low Priority)

3.1 Provider-Specific Clients

Create convenience clients for popular providers:

# graphiti_core/llm_client/openrouter_client.py
class OpenRouterClient(OpenAIGenericClient):
    """Pre-configured client for OpenRouter.
    
    Example:
        >>> client = OpenRouterClient(
        ...     api_key="sk-or-v1-...",
        ...     model="meta-llama/llama-3-8b-instruct"
        ... )
    """
    def __init__(
        self,
        api_key: str,
        model: str,
        temperature: float = DEFAULT_TEMPERATURE,
        max_tokens: int = DEFAULT_MAX_TOKENS,
        **kwargs
    ):
        config = LLMConfig(
            api_key=api_key,
            model=model,
            base_url="https://openrouter.ai/api/v1",
            temperature=temperature,
            max_tokens=max_tokens
        )
        super().__init__(config=config, **kwargs)
# graphiti_core/llm_client/together_client.py
class TogetherClient(OpenAIGenericClient):
    """Pre-configured client for Together.ai.
    
    Example:
        >>> client = TogetherClient(
        ...     api_key="your-together-key",
        ...     model="meta-llama/Llama-3-70b-chat-hf"
        ... )
    """
    def __init__(
        self,
        api_key: str,
        model: str,
        temperature: float = DEFAULT_TEMPERATURE,
        max_tokens: int = DEFAULT_MAX_TOKENS,
        **kwargs
    ):
        config = LLMConfig(
            api_key=api_key,
            model=model,
            base_url="https://api.together.xyz/v1",
            temperature=temperature,
            max_tokens=max_tokens
        )
        super().__init__(config=config, **kwargs)

3.2 Provider Compatibility Matrix

Add to documentation:

Provider Standard Client Generic Client Structured Outputs Notes
OpenAI OpenAIClient Full (Responses API) Recommended: Use OpenAIClient
Azure OpenAI AzureOpenAILLMClient Full (Responses API) Requires API version 2024-08-01-preview+
OpenRouter OpenAIGenericClient ⚠️ Limited (JSON Schema) Use OpenAIGenericClient
Together.ai OpenAIGenericClient ⚠️ Limited (JSON Schema) Use OpenAIGenericClient
Ollama OpenAIGenericClient ⚠️ Limited (JSON mode) Local deployment
Groq OpenAIGenericClient ⚠️ Limited (JSON Schema) Very fast inference
Perplexity OpenAIGenericClient ⚠️ Limited (JSON mode) Primarily for search

Testing Recommendations

Unit Tests

  1. Endpoint detection logic

    def test_should_use_responses_api():
        # OpenAI native should use Responses API
        config = LLMConfig(base_url="https://api.openai.com/v1")
        assert config.use_responses_api is True
    
        # Custom endpoints should not
        config = LLMConfig(base_url="https://openrouter.ai/api/v1")
        assert config.use_responses_api is False
    
  2. Client selection

    def test_create_openai_client_auto_selection():
        # Should return OpenAIClient for OpenAI
        config = LLMConfig(api_key="test")
        client = create_openai_client(config)
        assert isinstance(client, OpenAIClient)
    
        # Should return OpenAIGenericClient for others
        config = LLMConfig(api_key="test", base_url="https://openrouter.ai/api/v1")
        client = create_openai_client(config)
        assert isinstance(client, OpenAIGenericClient)
    

Integration Tests

  1. Mock server tests with responses for both endpoints
  2. Real provider tests (optional, may require API keys):
    • OpenRouter
    • Together.ai
    • Ollama (local)

Manual Testing Checklist

  • OpenRouter with Llama models
  • Together.ai with various models
  • Ollama with local models
  • Groq with fast models
  • Verify error messages are helpful
  • Test both structured and unstructured outputs

Summary of Issues

Issue Current State Impact Priority
/v1/responses endpoint usage Used by default OpenAIClient BREAKS all non-OpenAI providers High
OpenAIGenericClient not exported Hidden from public API Users can't easily use it High
Poor error messages Generic 404 errors Confusing for users High
No auto-detection Must manually choose client Poor DX Medium
Limited docs Only Ollama example Users don't know how to configure other providers High
No structured output in Generic client Only supports loose JSON Reduced type safety for custom endpoints Medium
No provider-specific helpers Generic configuration only More setup required Low

Implementation Roadmap

Phase 1: Quick Fixes (1-2 days)

  1. Export OpenAIGenericClient in public API
  2. Add documentation section for custom endpoints
  3. Improve error messages in OpenAIClient
  4. Add examples for OpenRouter, Together.ai

Phase 2: Enhanced Support (3-5 days)

  1. Add auto-detection logic to LLMConfig
  2. Create factory function for client selection
  3. Enhance OpenAIGenericClient with better JSON schema support
  4. Add comprehensive tests

Phase 3: Polish (2-3 days)

  1. Create provider-specific client classes
  2. Build compatibility matrix documentation
  3. Add integration tests with real providers
  4. Update all examples and guides

References

Contributing

If you're implementing these changes, please ensure:

  1. All changes follow the repository guidelines in AGENTS.md
  2. Run make format before committing
  3. Run make lint and make test to verify changes
  4. Update documentation for any new public APIs
  5. Add examples demonstrating the new functionality

Questions or Issues?