CRITICAL FIX - Database Parameter (graphiti_core):
- Fixed graphiti_core/driver/neo4j_driver.py execute_query method
- database_ parameter was incorrectly added to params dict instead of kwargs
- Now correctly passed as keyword argument to Neo4j driver
- Impact: All queries now execute in configured database (not default 'neo4j')
- Root cause: Violated Neo4j Python driver API contract
Technical Details:
Previous code (BROKEN):
params.setdefault('database_', self._database) # Wrong - in params dict
result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)
Fixed code (CORRECT):
kwargs.setdefault('database_', self._database) # Correct - in kwargs
result = await self.client.execute_query(cypher_query_, parameters_=params, **kwargs)
FIX - Index Creation Error Handling (MCP server):
- Added graceful handling for Neo4j IF NOT EXISTS bug
- Prevents MCP server crash when indices already exist
- Logs warning instead of failing initialization
- Handles EquivalentSchemaRuleAlreadyExists error gracefully
Files Modified:
- graphiti_core/driver/neo4j_driver.py (3 lines changed)
- mcp_server/src/graphiti_mcp_server.py (12 lines added error handling)
- mcp_server/pyproject.toml (version bump to 1.0.5)
Testing:
- Python syntax validation: PASSED
- Ruff formatting: PASSED
- Ruff linting: PASSED
Closes issues with:
- Data being stored in wrong Neo4j database
- MCP server crashing on startup with EquivalentSchemaRuleAlreadyExists
- NEO4J_DATABASE environment variable being ignored
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
19 KiB
OpenAI-Compatible Custom Endpoint Support in Graphiti
Overview
This document analyzes how Graphiti handles OpenAI-compatible custom endpoints (like OpenRouter, NagaAI, Together.ai, etc.) and provides recommendations for improving support.
Current Architecture
Graphiti has three main OpenAI-compatible client implementations:
1. OpenAIClient (Default)
File: graphiti_core/llm_client/openai_client.py
- Extends
BaseOpenAIClient - Uses the new OpenAI Responses API (
/v1/responsesendpoint) - Uses
client.responses.parse()for structured outputs (OpenAI SDK v1.91+) - This is the default client exported in the public API
response = await self.client.responses.parse(
model=model,
input=messages,
temperature=temperature,
max_output_tokens=max_tokens,
text_format=response_model,
reasoning={'effort': reasoning},
text={'verbosity': verbosity},
)
2. OpenAIGenericClient (Legacy)
File: graphiti_core/llm_client/openai_generic_client.py
- Uses the standard Chat Completions API (
/v1/chat/completions) - Uses
client.chat.completions.create() - Only supports unstructured JSON responses (not Pydantic schemas)
- Currently not exported in
__init__.py(hidden from public API)
response = await self.client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
response_format={'type': 'json_object'},
)
3. AzureOpenAILLMClient
File: graphiti_core/llm_client/azure_openai_client.py
- Azure-specific implementation
- Also uses
responses.parse()likeOpenAIClient - Handles Azure-specific authentication and endpoints
The Root Problem
Issue Description
When users configure Graphiti with custom OpenAI-compatible endpoints, they encounter errors because:
-
OpenAIClientuses the new/v1/responsesendpoint viaclient.responses.parse()- This is a new OpenAI API (introduced in OpenAI SDK v1.91.0) for structured outputs
- This endpoint is proprietary to OpenAI and not part of the standard OpenAI-compatible API specification
-
Most OpenAI-compatible services (OpenRouter, NagaAI, Ollama, Together.ai, etc.) only implement the standard
/v1/chat/completionsendpoint- They do NOT implement
/v1/responses
- They do NOT implement
-
When you configure a
base_urlpointing to these services, Graphiti tries to call:https://your-custom-endpoint.com/v1/responsesInstead of the expected:
https://your-custom-endpoint.com/v1/chat/completions
Example Error Scenario
from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIClient, LLMConfig
config = LLMConfig(
api_key="sk-or-v1-...",
model="meta-llama/llama-3-8b-instruct",
base_url="https://openrouter.ai/api/v1"
)
llm_client = OpenAIClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)
# This will fail because OpenRouter doesn't have /v1/responses endpoint
# Error: 404 Not Found - https://openrouter.ai/api/v1/responses
Current Workaround (Documented)
The README documents using OpenAIGenericClient with Ollama:
from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig
llm_config = LLMConfig(
api_key="ollama",
model="deepseek-r1:7b",
base_url="http://localhost:11434/v1"
)
llm_client = OpenAIGenericClient(config=llm_config)
Limitations of Current Workaround
OpenAIGenericClientdoesn't support structured outputs with Pydantic models- It only returns raw JSON and manually validates schemas
- It's not the recommended/default client
- It's not exported in the public API (
graphiti_core.llm_client) - Users must know to import from the internal module path
Recommended Solutions
Priority 1: Quick Wins (High Priority)
1.1 Export OpenAIGenericClient in Public API
File: graphiti_core/llm_client/__init__.py
Current:
from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient
__all__ = ['LLMClient', 'OpenAIClient', 'LLMConfig', 'RateLimitError']
Proposed:
from .client import LLMClient
from .config import LLMConfig
from .errors import RateLimitError
from .openai_client import OpenAIClient
from .openai_generic_client import OpenAIGenericClient
__all__ = ['LLMClient', 'OpenAIClient', 'OpenAIGenericClient', 'LLMConfig', 'RateLimitError']
1.2 Add Clear Documentation
File: README.md
Add a dedicated section:
### Using OpenAI-Compatible Endpoints (OpenRouter, NagaAI, Together.ai, etc.)
Most OpenAI-compatible services only support the standard Chat Completions API,
not OpenAI's newer Responses API. Use `OpenAIGenericClient` for these services:
**OpenRouter Example**:
```python
from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIGenericClient, LLMConfig
config = LLMConfig(
api_key="sk-or-v1-...",
model="meta-llama/llama-3-8b-instruct",
base_url="https://openrouter.ai/api/v1"
)
llm_client = OpenAIGenericClient(config=config)
graphiti = Graphiti(uri, user, password, llm_client=llm_client)
Together.ai Example:
config = LLMConfig(
api_key="your-together-api-key",
model="meta-llama/Llama-3-70b-chat-hf",
base_url="https://api.together.xyz/v1"
)
llm_client = OpenAIGenericClient(config=config)
Note: OpenAIGenericClient has limited structured output support compared to
the default OpenAIClient. It uses JSON mode instead of Pydantic schema validation.
#### 1.3 Add Better Error Messages
**File**: `graphiti_core/llm_client/openai_client.py`
Add error handling that detects the issue:
```python
async def _create_structured_completion(self, ...):
try:
response = await self.client.responses.parse(...)
return response
except openai.NotFoundError as e:
if self.config.base_url and "api.openai.com" not in self.config.base_url:
raise Exception(
f"The OpenAI Responses API (/v1/responses) is not available at {self.config.base_url}. "
f"Most OpenAI-compatible services only support /v1/chat/completions. "
f"Please use OpenAIGenericClient instead of OpenAIClient for custom endpoints. "
f"See: https://help.getzep.com/graphiti/guides/custom-endpoints"
) from e
raise
Priority 2: Better UX (Medium Priority)
2.1 Add Auto-Detection Logic
File: graphiti_core/llm_client/config.py
class LLMConfig:
def __init__(
self,
api_key: str | None = None,
model: str | None = None,
base_url: str | None = None,
temperature: float = DEFAULT_TEMPERATURE,
max_tokens: int = DEFAULT_MAX_TOKENS,
small_model: str | None = None,
use_responses_api: bool | None = None, # NEW: Auto-detect if None
):
self.base_url = base_url
self.api_key = api_key
self.model = model
self.small_model = small_model
self.temperature = temperature
self.max_tokens = max_tokens
# Auto-detect API style based on base_url
if use_responses_api is None:
self.use_responses_api = self._should_use_responses_api()
else:
self.use_responses_api = use_responses_api
def _should_use_responses_api(self) -> bool:
"""Determine if we should use the Responses API based on base_url."""
if self.base_url is None:
return True # Default OpenAI
# Known services that support Responses API
supported_services = ["api.openai.com", "azure.com"]
return any(service in self.base_url for service in supported_services)
2.2 Create a Unified Smart Client
Option A: Modify OpenAIClient to Fall Back
class OpenAIClient(BaseOpenAIClient):
def __init__(self, config: LLMConfig | None = None, ...):
super().__init__(config, ...)
if config is None:
config = LLMConfig()
self.use_responses_api = config.use_responses_api
self.client = AsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
async def _create_structured_completion(self, ...):
if self.use_responses_api:
# Use responses.parse() for OpenAI native
return await self.client.responses.parse(...)
else:
# Fall back to chat.completions with JSON schema for compatibility
return await self.client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
response_format={
"type": "json_schema",
"json_schema": {
"name": response_model.__name__,
"schema": response_model.model_json_schema(),
"strict": False
}
}
)
Option B: Create a Factory Function
# graphiti_core/llm_client/__init__.py
def create_openai_client(
config: LLMConfig | None = None,
cache: bool = False,
**kwargs
) -> LLMClient:
"""
Factory to create the appropriate OpenAI-compatible client.
Automatically selects between OpenAIClient (for native OpenAI)
and OpenAIGenericClient (for OpenAI-compatible services).
Args:
config: LLM configuration including base_url
cache: Whether to enable caching
**kwargs: Additional arguments passed to the client
Returns:
LLMClient: Either OpenAIClient or OpenAIGenericClient
Example:
>>> # Automatically uses OpenAIGenericClient for OpenRouter
>>> config = LLMConfig(
... api_key="sk-or-v1-...",
... model="meta-llama/llama-3-8b-instruct",
... base_url="https://openrouter.ai/api/v1"
... )
>>> client = create_openai_client(config)
"""
if config is None:
config = LLMConfig()
# Auto-detect based on base_url
if config.base_url is None or "api.openai.com" in config.base_url:
return OpenAIClient(config, cache, **kwargs)
else:
return OpenAIGenericClient(config, cache, **kwargs)
2.3 Enhance OpenAIGenericClient with Better Structured Output Support
File: graphiti_core/llm_client/openai_generic_client.py
async def _generate_response(
self,
messages: list[Message],
response_model: type[BaseModel] | None = None,
max_tokens: int = DEFAULT_MAX_TOKENS,
model_size: ModelSize = ModelSize.medium,
) -> dict[str, typing.Any]:
openai_messages: list[ChatCompletionMessageParam] = []
for m in messages:
m.content = self._clean_input(m.content)
if m.role == 'user':
openai_messages.append({'role': 'user', 'content': m.content})
elif m.role == 'system':
openai_messages.append({'role': 'system', 'content': m.content})
try:
# Try to use json_schema format (supported by more providers)
if response_model:
response = await self.client.chat.completions.create(
model=self.model or DEFAULT_MODEL,
messages=openai_messages,
temperature=self.temperature,
max_tokens=max_tokens or self.max_tokens,
response_format={
"type": "json_schema",
"json_schema": {
"name": response_model.__name__,
"schema": response_model.model_json_schema(),
"strict": False # Most providers don't support strict mode
}
}
)
else:
response = await self.client.chat.completions.create(
model=self.model or DEFAULT_MODEL,
messages=openai_messages,
temperature=self.temperature,
max_tokens=max_tokens or self.max_tokens,
response_format={'type': 'json_object'},
)
result = response.choices[0].message.content or '{}'
return json.loads(result)
except Exception as e:
logger.error(f'Error in generating LLM response: {e}')
raise
Priority 3: Nice to Have (Low Priority)
3.1 Provider-Specific Clients
Create convenience clients for popular providers:
# graphiti_core/llm_client/openrouter_client.py
class OpenRouterClient(OpenAIGenericClient):
"""Pre-configured client for OpenRouter.
Example:
>>> client = OpenRouterClient(
... api_key="sk-or-v1-...",
... model="meta-llama/llama-3-8b-instruct"
... )
"""
def __init__(
self,
api_key: str,
model: str,
temperature: float = DEFAULT_TEMPERATURE,
max_tokens: int = DEFAULT_MAX_TOKENS,
**kwargs
):
config = LLMConfig(
api_key=api_key,
model=model,
base_url="https://openrouter.ai/api/v1",
temperature=temperature,
max_tokens=max_tokens
)
super().__init__(config=config, **kwargs)
# graphiti_core/llm_client/together_client.py
class TogetherClient(OpenAIGenericClient):
"""Pre-configured client for Together.ai.
Example:
>>> client = TogetherClient(
... api_key="your-together-key",
... model="meta-llama/Llama-3-70b-chat-hf"
... )
"""
def __init__(
self,
api_key: str,
model: str,
temperature: float = DEFAULT_TEMPERATURE,
max_tokens: int = DEFAULT_MAX_TOKENS,
**kwargs
):
config = LLMConfig(
api_key=api_key,
model=model,
base_url="https://api.together.xyz/v1",
temperature=temperature,
max_tokens=max_tokens
)
super().__init__(config=config, **kwargs)
3.2 Provider Compatibility Matrix
Add to documentation:
| Provider | Standard Client | Generic Client | Structured Outputs | Notes |
|---|---|---|---|---|
| OpenAI | ✅ OpenAIClient |
✅ | ✅ Full (Responses API) | Recommended: Use OpenAIClient |
| Azure OpenAI | ✅ AzureOpenAILLMClient |
✅ | ✅ Full (Responses API) | Requires API version 2024-08-01-preview+ |
| OpenRouter | ❌ | ✅ OpenAIGenericClient |
⚠️ Limited (JSON Schema) | Use OpenAIGenericClient |
| Together.ai | ❌ | ✅ OpenAIGenericClient |
⚠️ Limited (JSON Schema) | Use OpenAIGenericClient |
| Ollama | ❌ | ✅ OpenAIGenericClient |
⚠️ Limited (JSON mode) | Local deployment |
| Groq | ❌ | ✅ OpenAIGenericClient |
⚠️ Limited (JSON Schema) | Very fast inference |
| Perplexity | ❌ | ✅ OpenAIGenericClient |
⚠️ Limited (JSON mode) | Primarily for search |
Testing Recommendations
Unit Tests
-
Endpoint detection logic
def test_should_use_responses_api(): # OpenAI native should use Responses API config = LLMConfig(base_url="https://api.openai.com/v1") assert config.use_responses_api is True # Custom endpoints should not config = LLMConfig(base_url="https://openrouter.ai/api/v1") assert config.use_responses_api is False -
Client selection
def test_create_openai_client_auto_selection(): # Should return OpenAIClient for OpenAI config = LLMConfig(api_key="test") client = create_openai_client(config) assert isinstance(client, OpenAIClient) # Should return OpenAIGenericClient for others config = LLMConfig(api_key="test", base_url="https://openrouter.ai/api/v1") client = create_openai_client(config) assert isinstance(client, OpenAIGenericClient)
Integration Tests
- Mock server tests with responses for both endpoints
- Real provider tests (optional, may require API keys):
- OpenRouter
- Together.ai
- Ollama (local)
Manual Testing Checklist
- OpenRouter with Llama models
- Together.ai with various models
- Ollama with local models
- Groq with fast models
- Verify error messages are helpful
- Test both structured and unstructured outputs
Summary of Issues
| Issue | Current State | Impact | Priority |
|---|---|---|---|
/v1/responses endpoint usage |
Used by default OpenAIClient |
BREAKS all non-OpenAI providers | High |
OpenAIGenericClient not exported |
Hidden from public API | Users can't easily use it | High |
| Poor error messages | Generic 404 errors | Confusing for users | High |
| No auto-detection | Must manually choose client | Poor DX | Medium |
| Limited docs | Only Ollama example | Users don't know how to configure other providers | High |
| No structured output in Generic client | Only supports loose JSON | Reduced type safety for custom endpoints | Medium |
| No provider-specific helpers | Generic configuration only | More setup required | Low |
Implementation Roadmap
Phase 1: Quick Fixes (1-2 days)
- Export
OpenAIGenericClientin public API - Add documentation section for custom endpoints
- Improve error messages in
OpenAIClient - Add examples for OpenRouter, Together.ai
Phase 2: Enhanced Support (3-5 days)
- Add auto-detection logic to
LLMConfig - Create factory function for client selection
- Enhance
OpenAIGenericClientwith better JSON schema support - Add comprehensive tests
Phase 3: Polish (2-3 days)
- Create provider-specific client classes
- Build compatibility matrix documentation
- Add integration tests with real providers
- Update all examples and guides
References
- OpenAI SDK v1.91.0+ Responses API: https://platform.openai.com/docs/api-reference/responses
- OpenAI Chat Completions API: https://platform.openai.com/docs/api-reference/chat
- OpenRouter API: https://openrouter.ai/docs
- Together.ai API: https://docs.together.ai/docs/openai-api-compatibility
- Ollama OpenAI compatibility: https://github.com/ollama/ollama/blob/main/docs/openai.md
Contributing
If you're implementing these changes, please ensure:
- All changes follow the repository guidelines in
AGENTS.md - Run
make formatbefore committing - Run
make lintandmake testto verify changes - Update documentation for any new public APIs
- Add examples demonstrating the new functionality
Questions or Issues?
- Open an issue: https://github.com/getzep/graphiti/issues
- Discussion: https://github.com/getzep/graphiti/discussions
- Documentation: https://help.getzep.com/graphiti