graphiti/DOCS/BACKLOG-OpenAI-Compatible-Endpoints.md
Lars Varming 341efd8c3d Fix: Critical database parameter bug + index creation error handling
CRITICAL FIX - Database Parameter (graphiti_core):
- Fixed graphiti_core/driver/neo4j_driver.py execute_query method
- database_ parameter was incorrectly added to params dict instead of kwargs
- Now correctly passed as keyword argument to Neo4j driver
- Impact: All queries now execute in configured database (not default 'neo4j')
- Root cause: Violated Neo4j Python driver API contract

Technical Details:
Previous code (BROKEN):
  params.setdefault('database_', self._database)  # Wrong - in params dict
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

Fixed code (CORRECT):
  kwargs.setdefault('database_', self._database)  # Correct - in kwargs
  result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)

FIX - Index Creation Error Handling (MCP server):
- Added graceful handling for Neo4j IF NOT EXISTS bug
- Prevents MCP server crash when indices already exist
- Logs warning instead of failing initialization
- Handles EquivalentSchemaRuleAlreadyExists error gracefully

Files Modified:
- graphiti_core/driver/neo4j_driver.py (3 lines changed)
- mcp_server/src/graphiti_mcp_server.py (12 lines added error handling)
- mcp_server/pyproject.toml (version bump to 1.0.5)

Testing:
- Python syntax validation: PASSED
- Ruff formatting: PASSED
- Ruff linting: PASSED

Closes issues with:
- Data being stored in wrong Neo4j database
- MCP server crashing on startup with EquivalentSchemaRuleAlreadyExists
- NEO4J_DATABASE environment variable being ignored

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 11:37:16 +01:00

569 lines
19 KiB
Markdown

# OpenAI-Compatible Custom Endpoint Support in Graphiti
## Overview
This document analyzes how Graphiti handles OpenAI-compatible custom endpoints (like OpenRouter, NagaAI, Together.ai, etc.) and provides recommendations for improving support.
## Current Architecture
Graphiti has **three main OpenAI-compatible client implementations**:
### 1. OpenAIClient (Default)
**File**: `graphiti_core/llm_client/openai_client.py`
- Extends `BaseOpenAIClient`
- Uses the **new OpenAI Responses API** (`/v1/responses` endpoint)
- Uses `client.responses.parse()` for structured outputs (OpenAI SDK v1.91+)
- This is the **default client** exported in the public API
```python
response = await self.client.responses.parse(
model=model,
input=messages,
temperature=temperature,
max_output_tokens=max_tokens,
text_format=response_model,
reasoning={'effort': reasoning},
text={'verbosity': verbosity},
)
```
### 2. OpenAIGenericClient (Legacy)
**File**: `graphiti_core/llm_client/openai_generic_client.py`
- Uses the **standard Chat Completions API** (`/v1/chat/completions`)
- Uses `client.chat.completions.create()`
- **Only supports unstructured JSON responses** (not Pydantic schemas)
- Currently **not exported** in `__init__.py` (hidden from public API)
```python
response = await self.client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
response_format={'type': 'json_object'},
)
```
### 3. AzureOpenAILLMClient
**File**: `graphiti_core/llm_client/azure_openai_client.py`
- Azure-specific implementation
- Also uses `responses.parse()` like `OpenAIClient`
- Handles Azure-specific authentication and endpoints
## The Root Problem
### Issue Description
When users configure Graphiti with custom OpenAI-compatible endpoints, they encounter errors because:
1. **`OpenAIClient` uses the new `/v1/responses` endpoint** via `client.responses.parse()`
- This is a **new OpenAI API** (introduced in OpenAI SDK v1.91.0) for structured outputs
- This endpoint is **proprietary to OpenAI** and **not part of the standard OpenAI-compatible API specification**
2. **Most OpenAI-compatible services** (OpenRouter, NagaAI, Ollama, Together.ai, etc.) **only implement** the standard `/v1/chat/completions` endpoint
- They do **NOT** implement `/v1/responses`
3. When you configure a `base_url` pointing to these services, Graphiti tries to call:
```
https://your-custom-endpoint.com/v1/responses
```
Instead of the expected:
```
https://your-custom-endpoint.com/v1/chat/completions
```
### Example Error Scenario
```python
from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIClient, LLMConfig
config = LLMConfig(
api_key="sk-or-v1-...",
model="meta-llama/llama-3-8b-instruct",
base_url="https://openrouter.ai/api/v1"
)
llm_client = OpenAIClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)
# This will fail because OpenRouter doesn't have /v1/responses endpoint
# Error: 404 Not Found - https://openrouter.ai/api/v1/responses
```
## Current Workaround (Documented)
The README documents using `OpenAIGenericClient` with Ollama:
```python
from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig
llm_config = LLMConfig(
api_key="ollama",
model="deepseek-r1:7b",
base_url="http://localhost:11434/v1"
)
llm_client = OpenAIGenericClient(config=llm_config)
```
### Limitations of Current Workaround
- `OpenAIGenericClient` **doesn't support structured outputs with Pydantic models**
- It only returns raw JSON and manually validates schemas
- It's not the recommended/default client
- It's **not exported** in the public API (`graphiti_core.llm_client`)
- Users must know to import from the internal module path
## Recommended Solutions
### Priority 1: Quick Wins (High Priority)
#### 1.1 Export `OpenAIGenericClient` in Public API
**File**: `graphiti_core/llm_client/__init__.py`
**Current**:
```python
from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient
__all__ = ['LLMClient', 'OpenAIClient', 'LLMConfig', 'RateLimitError']
```
**Proposed**:
```python
from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient
from .openai_generic_client import OpenAIGenericClient
__all__ = ['LLMClient', 'OpenAIClient', 'OpenAIGenericClient', 'LLMConfig', 'RateLimitError']
```
#### 1.2 Add Clear Documentation
**File**: `README.md`
Add a dedicated section:
```markdown
### Using OpenAI-Compatible Endpoints (OpenRouter, NagaAI, Together.ai, etc.)
Most OpenAI-compatible services only support the standard Chat Completions API,
not OpenAI's newer Responses API. Use `OpenAIGenericClient` for these services:
**OpenRouter Example**:
```python
from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIGenericClient, LLMConfig
config = LLMConfig(
api_key="sk-or-v1-...",
model="meta-llama/llama-3-8b-instruct",
base_url="https://openrouter.ai/api/v1"
)
llm_client = OpenAIGenericClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)
```
**Together.ai Example**:
```python
config = LLMConfig(
api_key="your-together-api-key",
model="meta-llama/Llama-3-70b-chat-hf",
base_url="https://api.together.xyz/v1"
)
llm_client = OpenAIGenericClient(config=config)
```
**Note**: `OpenAIGenericClient` has limited structured output support compared to
the default `OpenAIClient`. It uses JSON mode instead of Pydantic schema validation.
```
#### 1.3 Add Better Error Messages
**File**: `graphiti_core/llm_client/openai_client.py`
Add error handling that detects the issue:
```python
async def _create_structured_completion(self, ...):
try:
response = await self.client.responses.parse(...)
return response
except openai.NotFoundError as e:
if self.config.base_url and "api.openai.com" not in self.config.base_url:
raise Exception(
f"The OpenAI Responses API (/v1/responses) is not available at {self.config.base_url}. "
f"Most OpenAI-compatible services only support /v1/chat/completions. "
f"Please use OpenAIGenericClient instead of OpenAIClient for custom endpoints. "
f"See: https://help.getzep.com/graphiti/guides/custom-endpoints"
) from e
raise
```
### Priority 2: Better UX (Medium Priority)
#### 2.1 Add Auto-Detection Logic
**File**: `graphiti_core/llm_client/config.py`
```python
class LLMConfig:
def __init__(
self,
api_key: str | None = None,
model: str | None = None,
base_url: str | None = None,
temperature: float = DEFAULT_TEMPERATURE,
max_tokens: int = DEFAULT_MAX_TOKENS,
small_model: str | None = None,
use_responses_api: bool | None = None, # NEW: Auto-detect if None
):
self.base_url = base_url
self.api_key = api_key
self.model = model
self.small_model = small_model
self.temperature = temperature
self.max_tokens = max_tokens
# Auto-detect API style based on base_url
if use_responses_api is None:
self.use_responses_api = self._should_use_responses_api()
else:
self.use_responses_api = use_responses_api
def _should_use_responses_api(self) -> bool:
"""Determine if we should use the Responses API based on base_url."""
if self.base_url is None:
return True # Default OpenAI
# Known services that support Responses API
supported_services = ["api.openai.com", "azure.com"]
return any(service in self.base_url for service in supported_services)
```
#### 2.2 Create a Unified Smart Client
**Option A**: Modify `OpenAIClient` to Fall Back
```python
class OpenAIClient(BaseOpenAIClient):
def __init__(self, config: LLMConfig | None = None, ...):
super().__init__(config, ...)
if config is None:
config = LLMConfig()
self.use_responses_api = config.use_responses_api
self.client = AsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
async def _create_structured_completion(self, ...):
if self.use_responses_api:
# Use responses.parse() for OpenAI native
return await self.client.responses.parse(...)
else:
# Fall back to chat.completions with JSON schema for compatibility
return await self.client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
response_format={
"type": "json_schema",
"json_schema": {
"name": response_model.__name__,
"schema": response_model.model_json_schema(),
"strict": False
}
}
)
```
**Option B**: Create a Factory Function
```python
# graphiti_core/llm_client/__init__.py
def create_openai_client(
config: LLMConfig | None = None,
cache: bool = False,
**kwargs
) -> LLMClient:
"""
Factory to create the appropriate OpenAI-compatible client.
Automatically selects between OpenAIClient (for native OpenAI)
and OpenAIGenericClient (for OpenAI-compatible services).
Args:
config: LLM configuration including base_url
cache: Whether to enable caching
**kwargs: Additional arguments passed to the client
Returns:
LLMClient: Either OpenAIClient or OpenAIGenericClient
Example:
>>> # Automatically uses OpenAIGenericClient for OpenRouter
>>> config = LLMConfig(
... api_key="sk-or-v1-...",
... model="meta-llama/llama-3-8b-instruct",
... base_url="https://openrouter.ai/api/v1"
... )
>>> client = create_openai_client(config)
"""
if config is None:
config = LLMConfig()
# Auto-detect based on base_url
if config.base_url is None or "api.openai.com" in config.base_url:
return OpenAIClient(config, cache, **kwargs)
else:
return OpenAIGenericClient(config, cache, **kwargs)
```
#### 2.3 Enhance `OpenAIGenericClient` with Better Structured Output Support
**File**: `graphiti_core/llm_client/openai_generic_client.py`
```python
async def _generate_response(
self,
messages: list[Message],
response_model: type[BaseModel] | None = None,
max_tokens: int = DEFAULT_MAX_TOKENS,
model_size: ModelSize = ModelSize.medium,
) -> dict[str, typing.Any]:
openai_messages: list[ChatCompletionMessageParam] = []
for m in messages:
m.content = self._clean_input(m.content)
if m.role == 'user':
openai_messages.append({'role': 'user', 'content': m.content})
elif m.role == 'system':
openai_messages.append({'role': 'system', 'content': m.content})
try:
# Try to use json_schema format (supported by more providers)
if response_model:
response = await self.client.chat.completions.create(
model=self.model or DEFAULT_MODEL,
messages=openai_messages,
temperature=self.temperature,
max_tokens=max_tokens or self.max_tokens,
response_format={
"type": "json_schema",
"json_schema": {
"name": response_model.__name__,
"schema": response_model.model_json_schema(),
"strict": False # Most providers don't support strict mode
}
}
)
else:
response = await self.client.chat.completions.create(
model=self.model or DEFAULT_MODEL,
messages=openai_messages,
temperature=self.temperature,
max_tokens=max_tokens or self.max_tokens,
response_format={'type': 'json_object'},
)
result = response.choices[0].message.content or '{}'
return json.loads(result)
except Exception as e:
logger.error(f'Error in generating LLM response: {e}')
raise
```
### Priority 3: Nice to Have (Low Priority)
#### 3.1 Provider-Specific Clients
Create convenience clients for popular providers:
```python
# graphiti_core/llm_client/openrouter_client.py
class OpenRouterClient(OpenAIGenericClient):
"""Pre-configured client for OpenRouter.
Example:
>>> client = OpenRouterClient(
... api_key="sk-or-v1-...",
... model="meta-llama/llama-3-8b-instruct"
... )
"""
def __init__(
self,
api_key: str,
model: str,
temperature: float = DEFAULT_TEMPERATURE,
max_tokens: int = DEFAULT_MAX_TOKENS,
**kwargs
):
config = LLMConfig(
api_key=api_key,
model=model,
base_url="https://openrouter.ai/api/v1",
temperature=temperature,
max_tokens=max_tokens
)
super().__init__(config=config, **kwargs)
```
```python
# graphiti_core/llm_client/together_client.py
class TogetherClient(OpenAIGenericClient):
"""Pre-configured client for Together.ai.
Example:
>>> client = TogetherClient(
... api_key="your-together-key",
... model="meta-llama/Llama-3-70b-chat-hf"
... )
"""
def __init__(
self,
api_key: str,
model: str,
temperature: float = DEFAULT_TEMPERATURE,
max_tokens: int = DEFAULT_MAX_TOKENS,
**kwargs
):
config = LLMConfig(
api_key=api_key,
model=model,
base_url="https://api.together.xyz/v1",
temperature=temperature,
max_tokens=max_tokens
)
super().__init__(config=config, **kwargs)
```
#### 3.2 Provider Compatibility Matrix
Add to documentation:
| Provider | Standard Client | Generic Client | Structured Outputs | Notes |
|----------|----------------|----------------|-------------------|-------|
| OpenAI | ✅ `OpenAIClient` | ✅ | ✅ Full (Responses API) | Recommended: Use `OpenAIClient` |
| Azure OpenAI | ✅ `AzureOpenAILLMClient` | ✅ | ✅ Full (Responses API) | Requires API version 2024-08-01-preview+ |
| OpenRouter | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON Schema) | Use `OpenAIGenericClient` |
| Together.ai | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON Schema) | Use `OpenAIGenericClient` |
| Ollama | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON mode) | Local deployment |
| Groq | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON Schema) | Very fast inference |
| Perplexity | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON mode) | Primarily for search |
## Testing Recommendations
### Unit Tests
1. **Endpoint detection logic**
```python
def test_should_use_responses_api():
# OpenAI native should use Responses API
config = LLMConfig(base_url="https://api.openai.com/v1")
assert config.use_responses_api is True
# Custom endpoints should not
config = LLMConfig(base_url="https://openrouter.ai/api/v1")
assert config.use_responses_api is False
```
2. **Client selection**
```python
def test_create_openai_client_auto_selection():
# Should return OpenAIClient for OpenAI
config = LLMConfig(api_key="test")
client = create_openai_client(config)
assert isinstance(client, OpenAIClient)
# Should return OpenAIGenericClient for others
config = LLMConfig(api_key="test", base_url="https://openrouter.ai/api/v1")
client = create_openai_client(config)
assert isinstance(client, OpenAIGenericClient)
```
### Integration Tests
1. **Mock server tests** with responses for both endpoints
2. **Real provider tests** (optional, may require API keys):
- OpenRouter
- Together.ai
- Ollama (local)
### Manual Testing Checklist
- [ ] OpenRouter with Llama models
- [ ] Together.ai with various models
- [ ] Ollama with local models
- [ ] Groq with fast models
- [ ] Verify error messages are helpful
- [ ] Test both structured and unstructured outputs
## Summary of Issues
| Issue | Current State | Impact | Priority |
|-------|---------------|--------|----------|
| `/v1/responses` endpoint usage | Used by default `OpenAIClient` | **BREAKS** all non-OpenAI providers | High |
| `OpenAIGenericClient` not exported | Hidden from public API | Users can't easily use it | High |
| Poor error messages | Generic 404 errors | Confusing for users | High |
| No auto-detection | Must manually choose client | Poor DX | Medium |
| Limited docs | Only Ollama example | Users don't know how to configure other providers | High |
| No structured output in Generic client | Only supports loose JSON | Reduced type safety for custom endpoints | Medium |
| No provider-specific helpers | Generic configuration only | More setup required | Low |
## Implementation Roadmap
### Phase 1: Quick Fixes (1-2 days)
1. Export `OpenAIGenericClient` in public API
2. Add documentation section for custom endpoints
3. Improve error messages in `OpenAIClient`
4. Add examples for OpenRouter, Together.ai
### Phase 2: Enhanced Support (3-5 days)
1. Add auto-detection logic to `LLMConfig`
2. Create factory function for client selection
3. Enhance `OpenAIGenericClient` with better JSON schema support
4. Add comprehensive tests
### Phase 3: Polish (2-3 days)
1. Create provider-specific client classes
2. Build compatibility matrix documentation
3. Add integration tests with real providers
4. Update all examples and guides
## References
- OpenAI SDK v1.91.0+ Responses API: https://platform.openai.com/docs/api-reference/responses
- OpenAI Chat Completions API: https://platform.openai.com/docs/api-reference/chat
- OpenRouter API: https://openrouter.ai/docs
- Together.ai API: https://docs.together.ai/docs/openai-api-compatibility
- Ollama OpenAI compatibility: https://github.com/ollama/ollama/blob/main/docs/openai.md
## Contributing
If you're implementing these changes, please ensure:
1. All changes follow the repository guidelines in `AGENTS.md`
2. Run `make format` before committing
3. Run `make lint` and `make test` to verify changes
4. Update documentation for any new public APIs
5. Add examples demonstrating the new functionality
## Questions or Issues?
- Open an issue: https://github.com/getzep/graphiti/issues
- Discussion: https://github.com/getzep/graphiti/discussions
- Documentation: https://help.getzep.com/graphiti