feat: eliminate ghost variables with configurable provider defaults system

Replace hardcoded DEFAULT_MODEL and DEFAULT_SMALL_MODEL constants across all LLM clients with a centralized, configurable provider defaults system. Key changes: - Created provider_defaults.py with centralized configuration for all providers - Added environment variable support for easy customization (e.g., GEMINI_DEFAULT_MODEL) - Updated all LLM clients to use configurable defaults instead of hardcoded constants - Made edge operations max_tokens configurable via EXTRACT_EDGES_MAX_TOKENS - Updated cross-encoder reranker clients to use provider defaults - Maintained full backward compatibility with existing configurations This resolves the issue where Gemini's flash-lite model has location constraints in Vertex AI that differ from the regular flash model, and users couldn't easily override these without editing source code. Environment variables now supported: - {PROVIDER}_DEFAULT_MODEL - {PROVIDER}_DEFAULT_SMALL_MODEL - {PROVIDER}_DEFAULT_MAX_TOKENS - {PROVIDER}_DEFAULT_TEMPERATURE - {PROVIDER}_EXTRACT_EDGES_MAX_TOKENS - EXTRACT_EDGES_MAX_TOKENS (global fallback) Fixes #681 Co-authored-by: Daniel Chalef <danielchalef@users.noreply.github.com>
2025-07-09 14:59:17 +00:00 · 2025-07-09 14:59:17 +00:00 · 93ab7375cd
commit 93ab7375cd
parent 183471c179
10 changed files with 357 additions and 29 deletions
--- a/PROVIDER_CONFIGURATION.md
+++ b/PROVIDER_CONFIGURATION.md
@ -0,0 +1,167 @@
 # Provider Configuration System
 This document describes the new provider configuration system that replaces hardcoded "ghost variables" with configurable defaults.
 ## Overview
 Previously, each LLM provider client had hardcoded model names and configuration values that could not be easily customized without modifying the source code. This created several issues:
 1. **Maintenance burden**: Updating to newer models required code changes
 2. **Limited flexibility**: Users couldn't easily switch to different models
 3. **Provider constraints**: Some models (like Gemini's flash-lite) have specific location constraints that differed from defaults
 4. **Hidden configurations**: Token limits and other operational parameters were buried in the code
 ## New Configuration System
 The new system introduces a centralized `provider_defaults.py` module that:
 1. **Centralizes all provider defaults** in a single location
 2. **Supports environment variable overrides** for easy customization
 3. **Maintains backward compatibility** with existing configurations
 4. **Provides provider-specific configurations** for different LLM providers
 ## Environment Variables
 You can now override any provider default using environment variables with the following pattern:
 ```bash
 # For OpenAI
 export OPENAI_DEFAULT_MODEL="gpt-4"
 export OPENAI_DEFAULT_SMALL_MODEL="gpt-4-mini"
 export OPENAI_DEFAULT_MAX_TOKENS="8192"
 export OPENAI_DEFAULT_TEMPERATURE="0.0"
 export OPENAI_EXTRACT_EDGES_MAX_TOKENS="16384"
 # For Gemini
 export GEMINI_DEFAULT_MODEL="gemini-2.5-flash"
 export GEMINI_DEFAULT_SMALL_MODEL="gemini-2.5-flash-lite"
 export GEMINI_DEFAULT_MAX_TOKENS="8192"
 export GEMINI_DEFAULT_TEMPERATURE="0.0"
 export GEMINI_EXTRACT_EDGES_MAX_TOKENS="16384"
 # For Anthropic
 export ANTHROPIC_DEFAULT_MODEL="claude-3-5-sonnet-latest"
 export ANTHROPIC_DEFAULT_SMALL_MODEL="claude-3-5-haiku-latest"
 export ANTHROPIC_DEFAULT_MAX_TOKENS="8192"
 export ANTHROPIC_DEFAULT_TEMPERATURE="0.0"
 export ANTHROPIC_EXTRACT_EDGES_MAX_TOKENS="16384"
 # For Groq
 export GROQ_DEFAULT_MODEL="llama-3.1-70b-versatile"
 export GROQ_DEFAULT_SMALL_MODEL="llama-3.1-8b-instant"
 export GROQ_DEFAULT_MAX_TOKENS="8192"
 export GROQ_DEFAULT_TEMPERATURE="0.0"
 export GROQ_EXTRACT_EDGES_MAX_TOKENS="16384"
 # General configuration (for edge operations)
 export EXTRACT_EDGES_MAX_TOKENS="16384"
 ```
 ## Supported Providers
 The system currently supports the following providers:
 - **openai**: OpenAI GPT models
 - **gemini**: Google Gemini models
 - **anthropic**: Anthropic Claude models
 - **groq**: Groq models
 - **azure_openai**: Azure OpenAI models
 ## Usage Examples
 ### Basic Usage
 The configuration system works transparently with existing code:
 ```python
 from graphiti_core.llm_client import OpenAIClient
 from graphiti_core.llm_client.config import LLMConfig
 # Uses default models (configurable via environment variables)
 client = OpenAIClient()
 # Or with explicit configuration (still uses provider defaults as fallback)
 config = LLMConfig(model="gpt-4", small_model="gpt-4-mini")
 client = OpenAIClient(config)
 ```
 ### Customizing Model Defaults
 Instead of hardcoding model names in your application, you can now use environment variables:
 ```bash
 # Set up your preferred models
 export OPENAI_DEFAULT_MODEL="gpt-4"
 export OPENAI_DEFAULT_SMALL_MODEL="gpt-4-mini"
 # Your application will automatically use these defaults
 python your_app.py
 ```
 ### Provider-Specific Configuration
 Each provider can have different default models and configurations:
 ```python
 from graphiti_core.llm_client.provider_defaults import get_provider_defaults
 # Get defaults for a specific provider
 openai_defaults = get_provider_defaults('openai')
 print(f"OpenAI default model: {openai_defaults.model}")
 print(f"OpenAI small model: {openai_defaults.small_model}")
 gemini_defaults = get_provider_defaults('gemini')
 print(f"Gemini default model: {gemini_defaults.model}")
 print(f"Gemini small model: {gemini_defaults.small_model}")
 ```
 ## Migration Guide
 ### Before (with ghost variables)
 ```python
 # In gemini_client.py
 DEFAULT_MODEL = 'gemini-2.5-flash'
 DEFAULT_SMALL_MODEL = 'models/gemini-2.5-flash-lite-preview-06-17'
 def _get_model_for_size(self, model_size: ModelSize) -> str:
    if model_size == ModelSize.small:
        return self.small_model or DEFAULT_SMALL_MODEL
    else:
        return self.model or DEFAULT_MODEL
 ```
 ### After (with configurable defaults)
 ```python
 # Configuration is now externalized
 from .provider_defaults import get_model_for_size
 def _get_model_for_size(self, model_size: ModelSize) -> str:
    return get_model_for_size(
        provider='gemini',
        model_size=model_size.value,
        user_model=self.model,
        user_small_model=self.small_model
    )
 ```
 ## Benefits
 1. **No More Ghost Variables**: All defaults are now configurable
 2. **Easy Model Updates**: Update models via environment variables
 3. **Provider Flexibility**: Each provider can have optimized defaults
 4. **Backward Compatibility**: Existing code continues to work unchanged
 5. **Environment-Specific Configuration**: Different environments can use different models
 6. **Reduced Maintenance**: No need to modify source code for model updates
 ## Implementation Details
 The new system is implemented in `graphiti_core/llm_client/provider_defaults.py` and includes:
 - `ProviderDefaults` dataclass for configuration structure
 - `get_provider_defaults()` function with environment variable support
 - `get_model_for_size()` centralized model selection logic
 - `get_extract_edges_max_tokens_default()` for operational parameters
 All existing LLM client implementations have been updated to use this new system while maintaining full backward compatibility.
--- a/graphiti_core/cross_encoder/gemini_reranker_client.py
+++ b/graphiti_core/cross_encoder/gemini_reranker_client.py
@ -20,6 +20,7 @@ from typing import TYPE_CHECKING
 from ..helpers import semaphore_gather
 from ..llm_client import LLMConfig, RateLimitError
 from ..llm_client.provider_defaults import get_provider_defaults
 from .client import CrossEncoderClient
 if TYPE_CHECKING:
@ -37,8 +38,6 @@ else:
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'gemini-2.5-flash-lite-preview-06-17'
 class GeminiRerankerClient(CrossEncoderClient):
    """
@ -103,7 +102,7 @@ Provide only a number between 0 and 100 (no explanation, just the number):"""
            responses = await semaphore_gather(
                *[
                    self.client.aio.models.generate_content(
-                        model=self.config.model or DEFAULT_MODEL,
+                        model=self.config.model or get_provider_defaults('gemini').model,
                        contents=prompt_messages,  # type: ignore
                        config=types.GenerateContentConfig(
                            system_instruction='You are an expert at rating passage relevance. Respond with only a number from 0-100.',
--- a/graphiti_core/cross_encoder/openai_reranker_client.py
+++ b/graphiti_core/cross_encoder/openai_reranker_client.py
@ -23,13 +23,12 @@ from openai import AsyncAzureOpenAI, AsyncOpenAI
 from ..helpers import semaphore_gather
 from ..llm_client import LLMConfig, OpenAIClient, RateLimitError
 from ..llm_client.provider_defaults import get_provider_defaults
 from ..prompts import Message
 from .client import CrossEncoderClient
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'gpt-4.1-nano'
 class OpenAIRerankerClient(CrossEncoderClient):
    def __init__(
@ -84,7 +83,7 @@ class OpenAIRerankerClient(CrossEncoderClient):
            responses = await semaphore_gather(
                *[
                    self.client.chat.completions.create(
-                        model=DEFAULT_MODEL,
+                        model=self.config.model or get_provider_defaults('openai').model,
                        messages=openai_messages,
                        temperature=0,
                        max_tokens=1,
--- a/graphiti_core/llm_client/anthropic_client.py
+++ b/graphiti_core/llm_client/anthropic_client.py
@ -27,6 +27,7 @@ from ..prompts.models import Message
 from .client import LLMClient
 from .config import DEFAULT_MAX_TOKENS, LLMConfig, ModelSize
 from .errors import RateLimitError, RefusalError
 from .provider_defaults import get_provider_defaults
 if TYPE_CHECKING:
    import anthropic
@ -62,8 +63,6 @@ AnthropicModel = Literal[
    'claude-2.0',
 ]
 DEFAULT_MODEL: AnthropicModel = 'claude-3-7-sonnet-latest'
 class AnthropicClient(LLMClient):
    """
@ -99,7 +98,7 @@ class AnthropicClient(LLMClient):
            config.max_tokens = max_tokens
        if config.model is None:
-            config.model = DEFAULT_MODEL
+            config.model = get_provider_defaults('anthropic').model
        super().__init__(config, cache)
        # Explicitly set the instance model to the config model to prevent type checking errors
--- a/graphiti_core/llm_client/gemini_client.py
+++ b/graphiti_core/llm_client/gemini_client.py
@ -25,6 +25,7 @@ from ..prompts.models import Message
 from .client import MULTILINGUAL_EXTRACTION_RESPONSES, LLMClient
 from .config import DEFAULT_MAX_TOKENS, LLMConfig, ModelSize
 from .errors import RateLimitError
 from .provider_defaults import get_model_for_size
 if TYPE_CHECKING:
    from google import genai
@ -43,9 +44,6 @@ else:
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'gemini-2.5-flash'
 DEFAULT_SMALL_MODEL = 'models/gemini-2.5-flash-lite-preview-06-17'
 class GeminiClient(LLMClient):
    """
@ -141,10 +139,12 @@ class GeminiClient(LLMClient):
    def _get_model_for_size(self, model_size: ModelSize) -> str:
        """Get the appropriate model name based on the requested size."""
-        if model_size == ModelSize.small:
+        return get_model_for_size(
-            return self.small_model or DEFAULT_SMALL_MODEL
+            provider='gemini',
-        else:
+            model_size=model_size.value,
-            return self.model or DEFAULT_MODEL
+            user_model=self.model,
            user_small_model=self.small_model
        )
    async def _generate_response(
        self,
--- a/graphiti_core/llm_client/groq_client.py
+++ b/graphiti_core/llm_client/groq_client.py
@ -38,10 +38,10 @@ from ..prompts.models import Message
 from .client import LLMClient
 from .config import LLMConfig, ModelSize
 from .errors import RateLimitError
 from .provider_defaults import get_provider_defaults
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'llama-3.1-70b-versatile'
 DEFAULT_MAX_TOKENS = 2048
@ -69,8 +69,9 @@ class GroqClient(LLMClient):
            elif m.role == 'system':
                msgs.append({'role': 'system', 'content': m.content})
        try:
            model = self.model or get_provider_defaults('groq').model
            response = await self.client.chat.completions.create(
-                model=self.model or DEFAULT_MODEL,
+                model=model,
                messages=msgs,
                temperature=self.temperature,
                max_tokens=max_tokens or self.max_tokens,
--- a/graphiti_core/llm_client/openai_base_client.py
+++ b/graphiti_core/llm_client/openai_base_client.py
@ -28,12 +28,10 @@ from ..prompts.models import Message
 from .client import MULTILINGUAL_EXTRACTION_RESPONSES, LLMClient
 from .config import DEFAULT_MAX_TOKENS, LLMConfig, ModelSize
 from .errors import RateLimitError, RefusalError
 from .provider_defaults import get_model_for_size
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'gpt-4.1-mini'
 DEFAULT_SMALL_MODEL = 'gpt-4.1-nano'
 class BaseOpenAIClient(LLMClient):
    """
@ -100,10 +98,12 @@ class BaseOpenAIClient(LLMClient):
    def _get_model_for_size(self, model_size: ModelSize) -> str:
        """Get the appropriate model name based on the requested size."""
-        if model_size == ModelSize.small:
+        return get_model_for_size(
-            return self.small_model or DEFAULT_SMALL_MODEL
+            provider='openai',
-        else:
+            model_size=model_size.value,
-            return self.model or DEFAULT_MODEL
+            user_model=self.model,
            user_small_model=self.small_model
        )
    def _handle_structured_response(self, response: Any) -> dict[str, Any]:
        """Handle structured response parsing and validation."""
--- a/graphiti_core/llm_client/openai_generic_client.py
+++ b/graphiti_core/llm_client/openai_generic_client.py
@ -28,11 +28,10 @@ from ..prompts.models import Message
 from .client import MULTILINGUAL_EXTRACTION_RESPONSES, LLMClient
 from .config import DEFAULT_MAX_TOKENS, LLMConfig, ModelSize
 from .errors import RateLimitError, RefusalError
 from .provider_defaults import get_provider_defaults
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'gpt-4.1-mini'
 class OpenAIGenericClient(LLMClient):
    """
@ -99,8 +98,9 @@ class OpenAIGenericClient(LLMClient):
            elif m.role == 'system':
                openai_messages.append({'role': 'system', 'content': m.content})
        try:
            model = self.model or get_provider_defaults('openai').model
            response = await self.client.chat.completions.create(
-                model=self.model or DEFAULT_MODEL,
+                model=model,
                messages=openai_messages,
                temperature=self.temperature,
                max_tokens=self.max_tokens,
--- a/graphiti_core/llm_client/provider_defaults.py
+++ b/graphiti_core/llm_client/provider_defaults.py
@ -0,0 +1,162 @@
 """
 Copyright 2024, Zep Software, Inc.
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 """
 import os
 from dataclasses import dataclass
 from typing import Dict, Optional
@dataclass
 class ProviderDefaults:
    """
    Configuration for provider-specific model defaults.
    This class replaces hardcoded DEFAULT_MODEL constants with configurable
    provider-specific defaults that can be overridden via environment variables.
    """
    model: str
    small_model: str
    max_tokens: int = 8192
    temperature: float = 0.0
    extract_edges_max_tokens: int = 16384
 # Provider-specific default configurations
 # These can be overridden via environment variables (see get_provider_defaults)
 PROVIDER_DEFAULTS: Dict[str, ProviderDefaults] = {
    'openai': ProviderDefaults(
        model='gpt-4.1-mini',
        small_model='gpt-4.1-nano',
        extract_edges_max_tokens=16384,
    ),
    'gemini': ProviderDefaults(
        model='gemini-2.5-flash',
        small_model='models/gemini-2.5-flash-lite-preview-06-17',
        extract_edges_max_tokens=16384,
    ),
    'anthropic': ProviderDefaults(
        model='claude-3-7-sonnet-latest',
        small_model='claude-3-7-haiku-latest',
        extract_edges_max_tokens=16384,
    ),
    'groq': ProviderDefaults(
        model='llama-3.1-70b-versatile',
        small_model='llama-3.1-8b-instant',
        extract_edges_max_tokens=16384,
    ),
    'azure_openai': ProviderDefaults(
        model='gpt-4.1-mini',
        small_model='gpt-4.1-nano',
        extract_edges_max_tokens=16384,
    ),
 }
 def get_provider_defaults(provider: str) -> ProviderDefaults:
    """
    Get provider-specific defaults with optional environment variable overrides.
    Environment variables can override defaults using the pattern:
    - {PROVIDER}_DEFAULT_MODEL
    - {PROVIDER}_DEFAULT_SMALL_MODEL
    - {PROVIDER}_DEFAULT_MAX_TOKENS
    - {PROVIDER}_DEFAULT_TEMPERATURE
    - {PROVIDER}_EXTRACT_EDGES_MAX_TOKENS
    Args:
        provider: The provider name (e.g., 'openai', 'gemini', 'anthropic', etc.)
    Returns:
        ProviderDefaults object with defaults for the specified provider
    Raises:
        ValueError: If the provider is not supported
    """
    if provider not in PROVIDER_DEFAULTS:
        raise ValueError(f"Unsupported provider: {provider}. Supported providers: {list(PROVIDER_DEFAULTS.keys())}")
    defaults = PROVIDER_DEFAULTS[provider]
    # Check for environment variable overrides
    env_prefix = provider.upper()
    model = os.getenv(f'{env_prefix}_DEFAULT_MODEL', defaults.model)
    small_model = os.getenv(f'{env_prefix}_DEFAULT_SMALL_MODEL', defaults.small_model)
    max_tokens = int(os.getenv(f'{env_prefix}_DEFAULT_MAX_TOKENS', str(defaults.max_tokens)))
    temperature = float(os.getenv(f'{env_prefix}_DEFAULT_TEMPERATURE', str(defaults.temperature)))
    extract_edges_max_tokens = int(os.getenv(f'{env_prefix}_EXTRACT_EDGES_MAX_TOKENS', str(defaults.extract_edges_max_tokens)))
    return ProviderDefaults(
        model=model,
        small_model=small_model,
        max_tokens=max_tokens,
        temperature=temperature,
        extract_edges_max_tokens=extract_edges_max_tokens
    )
 def get_model_for_size(provider: str, model_size: str, user_model: Optional[str] = None, user_small_model: Optional[str] = None) -> str:
    """
    Get the appropriate model name based on the requested size and provider.
    This function replaces the _get_model_for_size methods in individual clients
    with a centralized implementation that uses configurable provider defaults.
    Args:
        provider: The provider name (e.g., 'openai', 'gemini', 'anthropic', etc.)
        model_size: The size of the model requested ('small' or 'medium')
        user_model: User-configured model override
        user_small_model: User-configured small model override
    Returns:
        The appropriate model name for the requested size
    """
    defaults = get_provider_defaults(provider)
    if model_size == 'small':
        return user_small_model or defaults.small_model
    else:
        return user_model or defaults.model
 def get_extract_edges_max_tokens(provider: str) -> int:
    """
    Get the maximum tokens for edge extraction operations.
    This function replaces hardcoded extract_edges_max_tokens constants
    with configurable provider-specific defaults.
    Args:
        provider: The provider name (e.g., 'openai', 'gemini', 'anthropic', etc.)
    Returns:
        The maximum tokens for edge extraction operations
    """
    defaults = get_provider_defaults(provider)
    return defaults.extract_edges_max_tokens
 def get_extract_edges_max_tokens_default() -> int:
    """
    Get the default maximum tokens for edge extraction operations.
    This function provides a configurable default that can be overridden
    via the EXTRACT_EDGES_MAX_TOKENS environment variable.
    Returns:
        The maximum tokens for edge extraction operations
    """
    return int(os.getenv('EXTRACT_EDGES_MAX_TOKENS', '16384'))
--- a/graphiti_core/utils/maintenance/edge_operations.py
+++ b/graphiti_core/utils/maintenance/edge_operations.py
@ -32,6 +32,7 @@ from graphiti_core.graphiti_types import GraphitiClients
 from graphiti_core.helpers import DEFAULT_DATABASE, MAX_REFLEXION_ITERATIONS, semaphore_gather
 from graphiti_core.llm_client import LLMClient
 from graphiti_core.llm_client.config import ModelSize
 from graphiti_core.llm_client.provider_defaults import get_extract_edges_max_tokens_default
 from graphiti_core.nodes import CommunityNode, EntityNode, EpisodicNode
 from graphiti_core.prompts import prompt_library
 from graphiti_core.prompts.dedupe_edges import EdgeDuplicate, UniqueFacts
@ -114,7 +115,7 @@ async def extract_edges(
 ) -> list[EntityEdge]:
    start = time()
-    extract_edges_max_tokens = 16384
+    extract_edges_max_tokens = get_extract_edges_max_tokens_default()
    llm_client = clients.llm_client
    edge_type_signature_map: dict[str, tuple[str, str]] = {