CRITICAL FIX - Database Parameter (graphiti_core):
- Fixed graphiti_core/driver/neo4j_driver.py execute_query method
- database_ parameter was incorrectly added to params dict instead of kwargs
- Now correctly passed as keyword argument to Neo4j driver
- Impact: All queries now execute in configured database (not default 'neo4j')
- Root cause: Violated Neo4j Python driver API contract
Technical Details:
Previous code (BROKEN):
params.setdefault('database_', self._database) # Wrong - in params dict
result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)
Fixed code (CORRECT):
kwargs.setdefault('database_', self._database) # Correct - in kwargs
result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)
FIX - Index Creation Error Handling (MCP server):
- Added graceful handling for Neo4j IF NOT EXISTS bug
- Prevents MCP server crash when indices already exist
- Logs warning instead of failing initialization
- Handles EquivalentSchemaRuleAlreadyExists error gracefully
Files Modified:
- graphiti_core/driver/neo4j_driver.py (3 lines changed)
- mcp_server/src/graphiti_mcp_server.py (12 lines added error handling)
- mcp_server/pyproject.toml (version bump to 1.0.5)
Testing:
- Python syntax validation: PASSED
- Ruff formatting: PASSED
- Ruff linting: PASSED
Closes issues with:
- Data being stored in wrong Neo4j database
- MCP server crashing on startup with EquivalentSchemaRuleAlreadyExists
- NEO4J_DATABASE environment variable being ignored
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
569 lines
19 KiB
Markdown
569 lines
19 KiB
Markdown
# OpenAI-Compatible Custom Endpoint Support in Graphiti
|
|
|
|
## Overview
|
|
|
|
This document analyzes how Graphiti handles OpenAI-compatible custom endpoints (like OpenRouter, NagaAI, Together.ai, etc.) and provides recommendations for improving support.
|
|
|
|
## Current Architecture
|
|
|
|
Graphiti has **three main OpenAI-compatible client implementations**:
|
|
|
|
### 1. OpenAIClient (Default)
|
|
|
|
**File**: `graphiti_core/llm_client/openai_client.py`
|
|
|
|
- Extends `BaseOpenAIClient`
|
|
- Uses the **new OpenAI Responses API** (`/v1/responses` endpoint)
|
|
- Uses `client.responses.parse()` for structured outputs (OpenAI SDK v1.91+)
|
|
- This is the **default client** exported in the public API
|
|
|
|
```python
|
|
response = await self.client.responses.parse(
|
|
model=model,
|
|
input=messages,
|
|
temperature=temperature,
|
|
max_output_tokens=max_tokens,
|
|
text_format=response_model,
|
|
reasoning={'effort': reasoning},
|
|
text={'verbosity': verbosity},
|
|
)
|
|
```
|
|
|
|
### 2. OpenAIGenericClient (Legacy)
|
|
|
|
**File**: `graphiti_core/llm_client/openai_generic_client.py`
|
|
|
|
- Uses the **standard Chat Completions API** (`/v1/chat/completions`)
|
|
- Uses `client.chat.completions.create()`
|
|
- **Only supports unstructured JSON responses** (not Pydantic schemas)
|
|
- Currently **not exported** in `__init__.py` (hidden from public API)
|
|
|
|
```python
|
|
response = await self.client.chat.completions.create(
|
|
model=model,
|
|
messages=messages,
|
|
temperature=temperature,
|
|
max_tokens=max_tokens,
|
|
response_format={'type': 'json_object'},
|
|
)
|
|
```
|
|
|
|
### 3. AzureOpenAILLMClient
|
|
|
|
**File**: `graphiti_core/llm_client/azure_openai_client.py`
|
|
|
|
- Azure-specific implementation
|
|
- Also uses `responses.parse()` like `OpenAIClient`
|
|
- Handles Azure-specific authentication and endpoints
|
|
|
|
## The Root Problem
|
|
|
|
### Issue Description
|
|
|
|
When users configure Graphiti with custom OpenAI-compatible endpoints, they encounter errors because:
|
|
|
|
1. **`OpenAIClient` uses the new `/v1/responses` endpoint** via `client.responses.parse()`
|
|
- This is a **new OpenAI API** (introduced in OpenAI SDK v1.91.0) for structured outputs
|
|
- This endpoint is **proprietary to OpenAI** and **not part of the standard OpenAI-compatible API specification**
|
|
|
|
2. **Most OpenAI-compatible services** (OpenRouter, NagaAI, Ollama, Together.ai, etc.) **only implement** the standard `/v1/chat/completions` endpoint
|
|
- They do **NOT** implement `/v1/responses`
|
|
|
|
3. When you configure a `base_url` pointing to these services, Graphiti tries to call:
|
|
```
|
|
https://your-custom-endpoint.com/v1/responses
|
|
```
|
|
Instead of the expected:
|
|
```
|
|
https://your-custom-endpoint.com/v1/chat/completions
|
|
```
|
|
|
|
### Example Error Scenario
|
|
|
|
```python
|
|
from graphiti_core import Graphiti
|
|
from graphiti_core.llm_client import OpenAIClient, LLMConfig
|
|
|
|
config = LLMConfig(
|
|
api_key="sk-or-v1-...",
|
|
model="meta-llama/llama-3-8b-instruct",
|
|
base_url="https://openrouter.ai/api/v1"
|
|
)
|
|
|
|
llm_client = OpenAIClient(config=config)
|
|
graphiti = Graphiti(uri, user, password, llm_client=llm_client)
|
|
|
|
# This will fail because OpenRouter doesn't have /v1/responses endpoint
|
|
# Error: 404 Not Found - https://openrouter.ai/api/v1/responses
|
|
```
|
|
|
|
## Current Workaround (Documented)
|
|
|
|
The README documents using `OpenAIGenericClient` with Ollama:
|
|
|
|
```python
|
|
from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
|
|
from graphiti_core.llm_client.config import LLMConfig
|
|
|
|
llm_config = LLMConfig(
|
|
api_key="ollama",
|
|
model="deepseek-r1:7b",
|
|
base_url="http://localhost:11434/v1"
|
|
)
|
|
|
|
llm_client = OpenAIGenericClient(config=llm_config)
|
|
```
|
|
|
|
### Limitations of Current Workaround
|
|
|
|
- `OpenAIGenericClient` **doesn't support structured outputs with Pydantic models**
|
|
- It only returns raw JSON and manually validates schemas
|
|
- It's not the recommended/default client
|
|
- It's **not exported** in the public API (`graphiti_core.llm_client`)
|
|
- Users must know to import from the internal module path
|
|
|
|
## Recommended Solutions
|
|
|
|
### Priority 1: Quick Wins (High Priority)
|
|
|
|
#### 1.1 Export `OpenAIGenericClient` in Public API
|
|
|
|
**File**: `graphiti_core/llm_client/__init__.py`
|
|
|
|
**Current**:
|
|
```python
|
|
from .client import LLMClient
|
|
from .config import LLMConfig
|
|
from .errors import RateLimitError
|
|
from .openai_client import OpenAIClient
|
|
|
|
__all__ = ['LLMClient', 'OpenAIClient', 'LLMConfig', 'RateLimitError']
|
|
```
|
|
|
|
**Proposed**:
|
|
```python
|
|
from .client import LLMClient
|
|
from .config import LLMConfig
|
|
from .errors import RateLimitError
|
|
from .openai_client import OpenAIClient
|
|
from .openai_generic_client import OpenAIGenericClient
|
|
|
|
__all__ = ['LLMClient', 'OpenAIClient', 'OpenAIGenericClient', 'LLMConfig', 'RateLimitError']
|
|
```
|
|
|
|
#### 1.2 Add Clear Documentation
|
|
|
|
**File**: `README.md`
|
|
|
|
Add a dedicated section:
|
|
|
|
```markdown
|
|
### Using OpenAI-Compatible Endpoints (OpenRouter, NagaAI, Together.ai, etc.)
|
|
|
|
Most OpenAI-compatible services only support the standard Chat Completions API,
|
|
not OpenAI's newer Responses API. Use `OpenAIGenericClient` for these services:
|
|
|
|
**OpenRouter Example**:
|
|
```python
|
|
from graphiti_core import Graphiti
|
|
from graphiti_core.llm_client import OpenAIGenericClient, LLMConfig
|
|
|
|
config = LLMConfig(
|
|
api_key="sk-or-v1-...",
|
|
model="meta-llama/llama-3-8b-instruct",
|
|
base_url="https://openrouter.ai/api/v1"
|
|
)
|
|
|
|
llm_client = OpenAIGenericClient(config=config)
|
|
graphiti = Graphiti(uri, user, password, llm_client=llm_client)
|
|
```
|
|
|
|
**Together.ai Example**:
|
|
```python
|
|
config = LLMConfig(
|
|
api_key="your-together-api-key",
|
|
model="meta-llama/Llama-3-70b-chat-hf",
|
|
base_url="https://api.together.xyz/v1"
|
|
)
|
|
llm_client = OpenAIGenericClient(config=config)
|
|
```
|
|
|
|
**Note**: `OpenAIGenericClient` has limited structured output support compared to
|
|
the default `OpenAIClient`. It uses JSON mode instead of Pydantic schema validation.
|
|
```
|
|
|
|
#### 1.3 Add Better Error Messages
|
|
|
|
**File**: `graphiti_core/llm_client/openai_client.py`
|
|
|
|
Add error handling that detects the issue:
|
|
|
|
```python
|
|
async def _create_structured_completion(self, ...):
|
|
try:
|
|
response = await self.client.responses.parse(...)
|
|
return response
|
|
except openai.NotFoundError as e:
|
|
if self.config.base_url and "api.openai.com" not in self.config.base_url:
|
|
raise Exception(
|
|
f"The OpenAI Responses API (/v1/responses) is not available at {self.config.base_url}. "
|
|
f"Most OpenAI-compatible services only support /v1/chat/completions. "
|
|
f"Please use OpenAIGenericClient instead of OpenAIClient for custom endpoints. "
|
|
f"See: https://help.getzep.com/graphiti/guides/custom-endpoints"
|
|
) from e
|
|
raise
|
|
```
|
|
|
|
### Priority 2: Better UX (Medium Priority)
|
|
|
|
#### 2.1 Add Auto-Detection Logic
|
|
|
|
**File**: `graphiti_core/llm_client/config.py`
|
|
|
|
```python
|
|
class LLMConfig:
|
|
def __init__(
|
|
self,
|
|
api_key: str | None = None,
|
|
model: str | None = None,
|
|
base_url: str | None = None,
|
|
temperature: float = DEFAULT_TEMPERATURE,
|
|
max_tokens: int = DEFAULT_MAX_TOKENS,
|
|
small_model: str | None = None,
|
|
use_responses_api: bool | None = None, # NEW: Auto-detect if None
|
|
):
|
|
self.base_url = base_url
|
|
self.api_key = api_key
|
|
self.model = model
|
|
self.small_model = small_model
|
|
self.temperature = temperature
|
|
self.max_tokens = max_tokens
|
|
|
|
# Auto-detect API style based on base_url
|
|
if use_responses_api is None:
|
|
self.use_responses_api = self._should_use_responses_api()
|
|
else:
|
|
self.use_responses_api = use_responses_api
|
|
|
|
def _should_use_responses_api(self) -> bool:
|
|
"""Determine if we should use the Responses API based on base_url."""
|
|
if self.base_url is None:
|
|
return True # Default OpenAI
|
|
|
|
# Known services that support Responses API
|
|
supported_services = ["api.openai.com", "azure.com"]
|
|
return any(service in self.base_url for service in supported_services)
|
|
```
|
|
|
|
#### 2.2 Create a Unified Smart Client
|
|
|
|
**Option A**: Modify `OpenAIClient` to Fall Back
|
|
|
|
```python
|
|
class OpenAIClient(BaseOpenAIClient):
|
|
def __init__(self, config: LLMConfig | None = None, ...):
|
|
super().__init__(config, ...)
|
|
if config is None:
|
|
config = LLMConfig()
|
|
|
|
self.use_responses_api = config.use_responses_api
|
|
self.client = AsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
|
|
|
|
async def _create_structured_completion(self, ...):
|
|
if self.use_responses_api:
|
|
# Use responses.parse() for OpenAI native
|
|
return await self.client.responses.parse(...)
|
|
else:
|
|
# Fall back to chat.completions with JSON schema for compatibility
|
|
return await self.client.chat.completions.create(
|
|
model=model,
|
|
messages=messages,
|
|
temperature=temperature,
|
|
max_tokens=max_tokens,
|
|
response_format={
|
|
"type": "json_schema",
|
|
"json_schema": {
|
|
"name": response_model.__name__,
|
|
"schema": response_model.model_json_schema(),
|
|
"strict": False
|
|
}
|
|
}
|
|
)
|
|
```
|
|
|
|
**Option B**: Create a Factory Function
|
|
|
|
```python
|
|
# graphiti_core/llm_client/__init__.py
|
|
|
|
def create_openai_client(
|
|
config: LLMConfig | None = None,
|
|
cache: bool = False,
|
|
**kwargs
|
|
) -> LLMClient:
|
|
"""
|
|
Factory to create the appropriate OpenAI-compatible client.
|
|
|
|
Automatically selects between OpenAIClient (for native OpenAI)
|
|
and OpenAIGenericClient (for OpenAI-compatible services).
|
|
|
|
Args:
|
|
config: LLM configuration including base_url
|
|
cache: Whether to enable caching
|
|
**kwargs: Additional arguments passed to the client
|
|
|
|
Returns:
|
|
LLMClient: Either OpenAIClient or OpenAIGenericClient
|
|
|
|
Example:
|
|
>>> # Automatically uses OpenAIGenericClient for OpenRouter
|
|
>>> config = LLMConfig(
|
|
... api_key="sk-or-v1-...",
|
|
... model="meta-llama/llama-3-8b-instruct",
|
|
... base_url="https://openrouter.ai/api/v1"
|
|
... )
|
|
>>> client = create_openai_client(config)
|
|
"""
|
|
if config is None:
|
|
config = LLMConfig()
|
|
|
|
# Auto-detect based on base_url
|
|
if config.base_url is None or "api.openai.com" in config.base_url:
|
|
return OpenAIClient(config, cache, **kwargs)
|
|
else:
|
|
return OpenAIGenericClient(config, cache, **kwargs)
|
|
```
|
|
|
|
#### 2.3 Enhance `OpenAIGenericClient` with Better Structured Output Support
|
|
|
|
**File**: `graphiti_core/llm_client/openai_generic_client.py`
|
|
|
|
```python
|
|
async def _generate_response(
|
|
self,
|
|
messages: list[Message],
|
|
response_model: type[BaseModel] | None = None,
|
|
max_tokens: int = DEFAULT_MAX_TOKENS,
|
|
model_size: ModelSize = ModelSize.medium,
|
|
) -> dict[str, typing.Any]:
|
|
openai_messages: list[ChatCompletionMessageParam] = []
|
|
for m in messages:
|
|
m.content = self._clean_input(m.content)
|
|
if m.role == 'user':
|
|
openai_messages.append({'role': 'user', 'content': m.content})
|
|
elif m.role == 'system':
|
|
openai_messages.append({'role': 'system', 'content': m.content})
|
|
|
|
try:
|
|
# Try to use json_schema format (supported by more providers)
|
|
if response_model:
|
|
response = await self.client.chat.completions.create(
|
|
model=self.model or DEFAULT_MODEL,
|
|
messages=openai_messages,
|
|
temperature=self.temperature,
|
|
max_tokens=max_tokens or self.max_tokens,
|
|
response_format={
|
|
"type": "json_schema",
|
|
"json_schema": {
|
|
"name": response_model.__name__,
|
|
"schema": response_model.model_json_schema(),
|
|
"strict": False # Most providers don't support strict mode
|
|
}
|
|
}
|
|
)
|
|
else:
|
|
response = await self.client.chat.completions.create(
|
|
model=self.model or DEFAULT_MODEL,
|
|
messages=openai_messages,
|
|
temperature=self.temperature,
|
|
max_tokens=max_tokens or self.max_tokens,
|
|
response_format={'type': 'json_object'},
|
|
)
|
|
|
|
result = response.choices[0].message.content or '{}'
|
|
return json.loads(result)
|
|
except Exception as e:
|
|
logger.error(f'Error in generating LLM response: {e}')
|
|
raise
|
|
```
|
|
|
|
### Priority 3: Nice to Have (Low Priority)
|
|
|
|
#### 3.1 Provider-Specific Clients
|
|
|
|
Create convenience clients for popular providers:
|
|
|
|
```python
|
|
# graphiti_core/llm_client/openrouter_client.py
|
|
class OpenRouterClient(OpenAIGenericClient):
|
|
"""Pre-configured client for OpenRouter.
|
|
|
|
Example:
|
|
>>> client = OpenRouterClient(
|
|
... api_key="sk-or-v1-...",
|
|
... model="meta-llama/llama-3-8b-instruct"
|
|
... )
|
|
"""
|
|
def __init__(
|
|
self,
|
|
api_key: str,
|
|
model: str,
|
|
temperature: float = DEFAULT_TEMPERATURE,
|
|
max_tokens: int = DEFAULT_MAX_TOKENS,
|
|
**kwargs
|
|
):
|
|
config = LLMConfig(
|
|
api_key=api_key,
|
|
model=model,
|
|
base_url="https://openrouter.ai/api/v1",
|
|
temperature=temperature,
|
|
max_tokens=max_tokens
|
|
)
|
|
super().__init__(config=config, **kwargs)
|
|
```
|
|
|
|
```python
|
|
# graphiti_core/llm_client/together_client.py
|
|
class TogetherClient(OpenAIGenericClient):
|
|
"""Pre-configured client for Together.ai.
|
|
|
|
Example:
|
|
>>> client = TogetherClient(
|
|
... api_key="your-together-key",
|
|
... model="meta-llama/Llama-3-70b-chat-hf"
|
|
... )
|
|
"""
|
|
def __init__(
|
|
self,
|
|
api_key: str,
|
|
model: str,
|
|
temperature: float = DEFAULT_TEMPERATURE,
|
|
max_tokens: int = DEFAULT_MAX_TOKENS,
|
|
**kwargs
|
|
):
|
|
config = LLMConfig(
|
|
api_key=api_key,
|
|
model=model,
|
|
base_url="https://api.together.xyz/v1",
|
|
temperature=temperature,
|
|
max_tokens=max_tokens
|
|
)
|
|
super().__init__(config=config, **kwargs)
|
|
```
|
|
|
|
#### 3.2 Provider Compatibility Matrix
|
|
|
|
Add to documentation:
|
|
|
|
| Provider | Standard Client | Generic Client | Structured Outputs | Notes |
|
|
|----------|----------------|----------------|-------------------|-------|
|
|
| OpenAI | ✅ `OpenAIClient` | ✅ | ✅ Full (Responses API) | Recommended: Use `OpenAIClient` |
|
|
| Azure OpenAI | ✅ `AzureOpenAILLMClient` | ✅ | ✅ Full (Responses API) | Requires API version 2024-08-01-preview+ |
|
|
| OpenRouter | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON Schema) | Use `OpenAIGenericClient` |
|
|
| Together.ai | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON Schema) | Use `OpenAIGenericClient` |
|
|
| Ollama | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON mode) | Local deployment |
|
|
| Groq | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON Schema) | Very fast inference |
|
|
| Perplexity | ❌ | ✅ `OpenAIGenericClient` | ⚠️ Limited (JSON mode) | Primarily for search |
|
|
|
|
## Testing Recommendations
|
|
|
|
### Unit Tests
|
|
|
|
1. **Endpoint detection logic**
|
|
```python
|
|
def test_should_use_responses_api():
|
|
# OpenAI native should use Responses API
|
|
config = LLMConfig(base_url="https://api.openai.com/v1")
|
|
assert config.use_responses_api is True
|
|
|
|
# Custom endpoints should not
|
|
config = LLMConfig(base_url="https://openrouter.ai/api/v1")
|
|
assert config.use_responses_api is False
|
|
```
|
|
|
|
2. **Client selection**
|
|
```python
|
|
def test_create_openai_client_auto_selection():
|
|
# Should return OpenAIClient for OpenAI
|
|
config = LLMConfig(api_key="test")
|
|
client = create_openai_client(config)
|
|
assert isinstance(client, OpenAIClient)
|
|
|
|
# Should return OpenAIGenericClient for others
|
|
config = LLMConfig(api_key="test", base_url="https://openrouter.ai/api/v1")
|
|
client = create_openai_client(config)
|
|
assert isinstance(client, OpenAIGenericClient)
|
|
```
|
|
|
|
### Integration Tests
|
|
|
|
1. **Mock server tests** with responses for both endpoints
|
|
2. **Real provider tests** (optional, may require API keys):
|
|
- OpenRouter
|
|
- Together.ai
|
|
- Ollama (local)
|
|
|
|
### Manual Testing Checklist
|
|
|
|
- [ ] OpenRouter with Llama models
|
|
- [ ] Together.ai with various models
|
|
- [ ] Ollama with local models
|
|
- [ ] Groq with fast models
|
|
- [ ] Verify error messages are helpful
|
|
- [ ] Test both structured and unstructured outputs
|
|
|
|
## Summary of Issues
|
|
|
|
| Issue | Current State | Impact | Priority |
|
|
|-------|---------------|--------|----------|
|
|
| `/v1/responses` endpoint usage | Used by default `OpenAIClient` | **BREAKS** all non-OpenAI providers | High |
|
|
| `OpenAIGenericClient` not exported | Hidden from public API | Users can't easily use it | High |
|
|
| Poor error messages | Generic 404 errors | Confusing for users | High |
|
|
| No auto-detection | Must manually choose client | Poor DX | Medium |
|
|
| Limited docs | Only Ollama example | Users don't know how to configure other providers | High |
|
|
| No structured output in Generic client | Only supports loose JSON | Reduced type safety for custom endpoints | Medium |
|
|
| No provider-specific helpers | Generic configuration only | More setup required | Low |
|
|
|
|
## Implementation Roadmap
|
|
|
|
### Phase 1: Quick Fixes (1-2 days)
|
|
1. Export `OpenAIGenericClient` in public API
|
|
2. Add documentation section for custom endpoints
|
|
3. Improve error messages in `OpenAIClient`
|
|
4. Add examples for OpenRouter, Together.ai
|
|
|
|
### Phase 2: Enhanced Support (3-5 days)
|
|
1. Add auto-detection logic to `LLMConfig`
|
|
2. Create factory function for client selection
|
|
3. Enhance `OpenAIGenericClient` with better JSON schema support
|
|
4. Add comprehensive tests
|
|
|
|
### Phase 3: Polish (2-3 days)
|
|
1. Create provider-specific client classes
|
|
2. Build compatibility matrix documentation
|
|
3. Add integration tests with real providers
|
|
4. Update all examples and guides
|
|
|
|
## References
|
|
|
|
- OpenAI SDK v1.91.0+ Responses API: https://platform.openai.com/docs/api-reference/responses
|
|
- OpenAI Chat Completions API: https://platform.openai.com/docs/api-reference/chat
|
|
- OpenRouter API: https://openrouter.ai/docs
|
|
- Together.ai API: https://docs.together.ai/docs/openai-api-compatibility
|
|
- Ollama OpenAI compatibility: https://github.com/ollama/ollama/blob/main/docs/openai.md
|
|
|
|
## Contributing
|
|
|
|
If you're implementing these changes, please ensure:
|
|
|
|
1. All changes follow the repository guidelines in `AGENTS.md`
|
|
2. Run `make format` before committing
|
|
3. Run `make lint` and `make test` to verify changes
|
|
4. Update documentation for any new public APIs
|
|
5. Add examples demonstrating the new functionality
|
|
|
|
## Questions or Issues?
|
|
|
|
- Open an issue: https://github.com/getzep/graphiti/issues
|
|
- Discussion: https://github.com/getzep/graphiti/discussions
|
|
- Documentation: https://help.getzep.com/graphiti
|