graphiti/DOMAIN_AGNOSTIC_IMPROVEMENT_PLAN.md
supmo668 74a422369c feat: Add enhanced configuration system with multi-provider LLM support
This commit introduces a comprehensive configuration system that makes
Graphiti more flexible and easier to configure across different
providers and deployment environments.

## New Features

- **Unified Configuration**: New GraphitiConfig class with Pydantic validation
- **YAML Support**: Load configuration from .graphiti.yaml files
- **Multi-Provider Support**: Easy switching between OpenAI, Azure, Anthropic,
  Gemini, Groq, and LiteLLM
- **LiteLLM Integration**: Unified access to 100+ LLM providers
- **Factory Functions**: Automatic client creation from configuration
- **Full Backward Compatibility**: Existing code continues to work

## Configuration System

- graphiti_core/config/settings.py: Pydantic configuration classes
- graphiti_core/config/providers.py: Provider enumerations and defaults
- graphiti_core/config/factory.py: Factory functions for client creation

## LiteLLM Client

- graphiti_core/llm_client/litellm_client.py: New unified LLM client
- Support for Azure OpenAI, AWS Bedrock, Vertex AI, Ollama, vLLM, etc.
- Automatic structured output detection

## Documentation

- docs/CONFIGURATION.md: Comprehensive configuration guide
- examples/graphiti_config_example.yaml: Example configurations
- DOMAIN_AGNOSTIC_IMPROVEMENT_PLAN.md: Future improvement roadmap

## Tests

- tests/config/test_settings.py: 22 tests for configuration
- tests/config/test_factory.py: 12 tests for factories
- 33/34 tests passing (97%)

## Issues Addressed

- #1004: Azure OpenAI support
- #1006: Azure OpenAI reranker support
- #1007: vLLM/OpenAI-compatible provider stability
- #1074: Ollama embeddings support
- #995: Docker Azure OpenAI support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 23:47:38 -08:00

23 KiB

Graphiti Domain-Agnostic Improvement Plan

Date: 2025-11-30 Status: Draft Last Pull: 422558d (main branch)


Executive Summary

This document outlines a strategic plan to make Graphiti more domain-agnostic and adaptable to diverse use cases beyond conversational AI. The current architecture, while powerful, contains several domain-specific assumptions (primarily around messaging/conversational data) that limit its applicability to other domains such as scientific research, legal documents, IoT data, healthcare records, financial transactions, etc.


Current Architecture Analysis

Key Components Review

  1. NER & Entity Extraction (graphiti_core/utils/maintenance/node_operations.py, graphiti_core/prompts/extract_nodes.py)

    • Hardcoded prompts for three episode types: message, text, JSON
    • Domain-specific language (e.g., "speaker", "conversation")
    • Entity type classification tightly coupled with extraction logic
  2. LLM Client Configuration (graphiti_core/llm_client/config.py, graphiti_core/graphiti.py)

    • Defaults to OpenAI across all components
    • No centralized model selection strategy
    • Temperature (1.0) and max_tokens (8192) hardcoded as defaults
  3. Episode Types (graphiti_core/nodes.py)

    • Limited to: message, text, JSON
    • Each type requires separate prompt functions
    • No extensibility mechanism for custom episode types
  4. Prompt System (graphiti_core/prompts/)

    • Prompts are Python functions, not configurable data
    • No template engine or override mechanism
    • Domain assumptions embedded in prompt text
  5. Search & Retrieval (graphiti_core/search/)

    • Flexible but complex configuration
    • Limited domain-specific search recipes
    • No semantic domain adapters

Identified Issues from GitHub (Top 20)

  1. #1087: Embedding truncation reduces retrieval quality for text-embedding-3-small
  2. #1074: Neo4j quickstart returns no results with OpenAI-compatible LLM + Ollama embeddings
  3. #1007: OpenAIGenericClient outputs unstable for vllm serving gpt-oss-20b
  4. #1006: OpenAIRerankerClient does not support AzureOpenAILLMClient
  5. #1004: Azure OpenAI is not supported
  6. #995: Docker container does not support Azure OpenAI
  7. #1077: Support for Google Cloud Spanner Graph
  8. #947: Support for Apache AGE as Graph DB
  9. #1016: Support episode vector
  10. #961: Improve Episodes API - return UUID, support GET by ID, custom metadata

Improvement Directives

1. Configurable Prompt System 🔴 Priority: CRITICAL

Objective

Replace hardcoded prompt functions with a templatable, extensible prompt system that supports domain customization.

Implementation Plan

Phase 1: Prompt Template Engine

  • Create PromptTemplate class with variable interpolation
  • Support multiple template formats (Jinja2, mustache, or custom)
  • Add prompt registry for registration and lookup
# Example API
class PromptTemplate:
    def __init__(self, template: str, variables: dict[str, str]):
        self.template = template
        self.variables = variables

    def render(self, context: dict[str, Any]) -> str:
        # Template rendering logic
        pass

class PromptRegistry:
    def register(self, name: str, template: PromptTemplate) -> None:
        pass

    def get(self, name: str) -> PromptTemplate:
        pass

    def override(self, name: str, template: PromptTemplate) -> None:
        pass

Phase 2: Refactor Existing Prompts

  • Convert all prompt functions in graphiti_core/prompts/ to templates
  • Maintain backward compatibility with existing API
  • Add domain-specific prompt overrides

Phase 3: Documentation & Examples

  • Create prompt customization guide
  • Provide domain-specific examples (legal, scientific, financial)
  • Add prompt testing utilities

Priority Rationale

  • Impact: Enables all domain customization downstream
  • Complexity: Medium - requires careful refactoring
  • Dependencies: None - can be done independently

Blockers

  • Breaking Changes: Need to maintain backward compatibility
  • LLM Provider Compatibility: Different providers may require different prompt formats
  • Testing: Need comprehensive test suite for prompt variations

Success Metrics

  • Users can customize prompts without code changes
  • 5+ domain-specific prompt examples documented
  • No regression in existing use cases

2. Pluggable NER & Entity Extraction Pipeline 🔴 Priority: CRITICAL

Objective

Make the entity extraction pipeline modular and extensible for different domain requirements.

Implementation Plan

Phase 1: Extraction Strategy Interface

  • Define ExtractionStrategy protocol/abstract class
  • Support custom entity extractors (LLM-based, rule-based, hybrid)
  • Allow domain-specific entity type systems
class ExtractionStrategy(Protocol):
    async def extract_entities(
        self,
        episode: EpisodicNode,
        context: dict[str, Any],
        entity_types: dict[str, type[BaseModel]] | None = None,
    ) -> list[EntityNode]:
        ...

    async def extract_relations(
        self,
        episode: EpisodicNode,
        entities: list[EntityNode],
        context: dict[str, Any],
    ) -> list[EntityEdge]:
        ...

Phase 2: Domain-Specific Extractors

  • Create extractors for common domains:
    • ScientificPaperExtractor: Extracts researchers, institutions, findings, citations
    • LegalDocumentExtractor: Extracts parties, cases, statutes, precedents
    • FinancialExtractor: Extracts companies, transactions, indicators
    • IoTEventExtractor: Extracts devices, sensors, readings, locations
    • HealthcareExtractor: Extracts patients, conditions, treatments, providers

Phase 3: Extractor Composition

  • Allow chaining multiple extractors
  • Support fallback strategies
  • Enable parallel extraction with merging

Priority Rationale

  • Impact: Directly addresses domain specificity in core extraction
  • Complexity: High - touches critical path
  • Dependencies: Depends on Directive #1 (prompts)

Blockers

  • Performance: Multiple extractors may impact latency
  • Conflict Resolution: Different extractors may produce conflicting entities
  • Schema Validation: Need flexible validation for diverse entity types

Success Metrics

  • 3+ domain-specific extractors implemented
  • 50%+ reduction in domain customization code
  • No performance degradation for default use case

3. Centralized Configuration Management 🟡 Priority: HIGH

Objective

Create a unified configuration system for LLM clients, embedders, and other components.

Implementation Plan

Phase 1: Configuration Schema

  • Create GraphitiConfig with hierarchical structure
  • Support environment variables, config files (YAML/TOML), and programmatic config
  • Add validation with Pydantic
class LLMProviderConfig(BaseModel):
    provider: Literal["openai", "anthropic", "gemini", "groq", "custom"]
    model: str
    small_model: str | None = None
    api_key: str | None = None
    base_url: str | None = None
    temperature: float = 1.0
    max_tokens: int = 8192

class EmbedderConfig(BaseModel):
    provider: Literal["openai", "voyage", "gemini", "custom"]
    model: str
    api_key: str | None = None
    embedding_dim: int | None = None

class GraphitiConfig(BaseModel):
    llm: LLMProviderConfig
    embedder: EmbedderConfig
    database: DatabaseConfig
    extraction: ExtractionConfig
    search: SearchConfig

Phase 2: Config Loading & Merging

  • Support config file discovery (.graphiti.yaml, graphiti.config.toml)
  • Merge configs from multiple sources (file < env < code)
  • Add config validation and helpful error messages

Phase 3: Domain-Specific Presets

  • Create preset configs for common use cases
  • Support config inheritance and composition
# Example: .graphiti.yaml
extends: "presets/scientific-research"

llm:
  provider: anthropic
  model: claude-sonnet-4-5-latest
  temperature: 0.3

extraction:
  entity_types:
    - Researcher
    - Institution
    - Finding
    - Methodology

  extractors:
    - type: llm
      prompt: prompts/scientific_entities.yaml
    - type: regex
      patterns: prompts/scientific_patterns.yaml

Priority Rationale

  • Impact: Simplifies deployment and customization
  • Complexity: Medium
  • Dependencies: None

Blockers

  • Backward Compatibility: Must support existing initialization patterns
  • Security: API keys and credentials management
  • Validation: Complex validation rules across providers

Success Metrics

  • Single config file for complete setup
  • Zero hardcoded defaults in core code
  • 10+ domain preset configs available

4. Extensible Episode Type System 🟡 Priority: HIGH

Objective

Allow users to define custom episode types with associated extraction logic.

Implementation Plan

Phase 1: Episode Type Registry

  • Create EpisodeTypeRegistry for dynamic episode types
  • Support custom episode type definitions with Pydantic
class EpisodeTypeDefinition(BaseModel):
    name: str
    description: str
    content_schema: type[BaseModel] | None = None
    extraction_strategy: str | ExtractionStrategy
    prompt_template: str | None = None

class EpisodeTypeRegistry:
    def register(self, episode_type: EpisodeTypeDefinition) -> None:
        pass

    def get(self, name: str) -> EpisodeTypeDefinition:
        pass

Phase 2: Dynamic Dispatch

  • Modify extract_nodes() to dispatch based on episode type
  • Support fallback to default extraction for undefined types

Phase 3: Common Episode Types

  • Provide built-in types for common domains:
    • scientific_paper
    • legal_document
    • financial_report
    • iot_event
    • healthcare_record
    • email
    • api_log

Priority Rationale

  • Impact: Removes major extensibility bottleneck
  • Complexity: Medium
  • Dependencies: Depends on Directive #2 (extractors)

Blockers

  • Type Safety: Ensuring type safety with dynamic types
  • Validation: Schema validation for custom content
  • Migration: Migrating existing message/text/JSON types

Success Metrics

  • Users can add episode types without code changes
  • 5+ built-in episode types for different domains
  • Clear migration path from existing types

5. Domain-Specific Search Strategies 🟢 Priority: MEDIUM

Objective

Provide domain-optimized search configurations and strategies.

Implementation Plan

Phase 1: Search Strategy Templates

  • Create domain-specific search configs in search_config_recipes.py
  • Optimize for domain characteristics (e.g., temporal for financial, spatial for IoT)
# Examples
FINANCIAL_TEMPORAL_SEARCH = SearchConfig(
    edge_config=EdgeSearchConfig(
        search_methods=[
            EdgeSearchMethod.cosine_similarity,
            EdgeSearchMethod.bm25,
        ],
        reranker=EdgeReranker.episode_mentions,
    ),
    # Prioritize recent events
    # ... domain-specific configuration
)

SCIENTIFIC_CITATION_SEARCH = SearchConfig(
    # Optimize for citation networks
    # ... domain-specific configuration
)

Phase 2: Semantic Domain Adapters

  • Create domain-specific query expansion
  • Add domain vocabulary mapping
  • Support domain-specific relevance scoring

Phase 3: Search Analytics

  • Track search performance by domain
  • Provide domain-specific search insights
  • Auto-tune search configs based on usage

Priority Rationale

  • Impact: Improves search quality for specific domains
  • Complexity: Low-Medium
  • Dependencies: None - additive feature

Blockers

  • Domain Expertise: Requires deep understanding of each domain
  • Evaluation: Need domain-specific test datasets
  • Maintenance: Each domain strategy needs ongoing optimization

Success Metrics

  • 5+ domain-optimized search strategies
  • Measurable improvement in domain-specific retrieval quality
  • Search strategy recommendation system

6. Multi-Provider LLM & Embedder Support Enhancement 🟢 Priority: MEDIUM

Objective

Improve support for diverse LLM and embedding providers, addressing current issues with Azure, Anthropic, and local models.

Implementation Plan

Phase 1: Provider Abstraction Improvements

  • Enhance LLMClient interface for provider-specific features
  • Better handling of structured output across providers (#1007)
  • Unified error handling and retries

Phase 2: Provider-Specific Optimizations

  • Azure OpenAI full support (#1004, #995, #1006)
  • Anthropic optimization for structured output
  • Local model support (Ollama, vLLM) (#1074, #1007)
  • Google Cloud Vertex AI integration

Phase 3: Embedder Flexibility

  • Support mixed embedding strategies (different models for nodes vs edges)
  • Domain-specific embedding fine-tuning
  • Embedding dimension adaptation (#1087)

Priority Rationale

  • Impact: Addresses multiple GitHub issues, improves flexibility
  • Complexity: Medium-High (provider-specific quirks)
  • Dependencies: Related to Directive #3 (config)

Blockers

  • Provider API Changes: External dependencies on provider APIs
  • Testing: Requires access to multiple provider accounts
  • Cost: Testing across providers can be expensive

Success Metrics

  • All providers in CLAUDE.md fully supported
  • Resolution of issues #1004, #1006, #1007, #1074, #995
  • Provider switching with zero code changes

7. Enhanced Metadata & Custom Attributes 🟢 Priority: MEDIUM

Objective

Support domain-specific metadata on all graph elements (nodes, edges, episodes).

Implementation Plan

Phase 1: Flexible Metadata Schema

  • Add custom_metadata: dict[str, Any] to all core types
  • Support typed metadata with Pydantic models
  • Index metadata for searchability

Phase 2: Domain-Specific Attributes

  • Support custom attributes per domain
  • Attribute extraction from episodes
  • Attribute-based filtering in search

Phase 3: Metadata API Improvements

  • Episode API enhancements (#961)
  • Metadata update operations
  • Bulk metadata operations

Priority Rationale

  • Impact: Enables rich domain modeling
  • Complexity: Low-Medium
  • Dependencies: Database schema changes

Blockers

  • Schema Migration: Existing graphs need migration
  • Index Performance: Metadata indexing may impact performance
  • Validation: Complex validation for diverse metadata

Success Metrics

  • Custom metadata on all graph elements
  • Metadata-based search and filtering
  • Resolution of issue #961

8. Database Provider Expansion 🔵 Priority: LOW

Objective

Support additional graph databases to meet diverse deployment requirements.

Implementation Plan

Phase 1: Abstract Driver Interface

  • Enhance GraphDriver abstraction
  • Standardize query translation layer
  • Support for property graph vs RDF models

Phase 2: New Drivers

  • Google Cloud Spanner Graph (#1077)
  • Apache AGE (#947)
  • Amazon Neptune improvements (#1082)
  • TigerGraph, NebulaGraph

Phase 3: Driver Selection Guide

  • Performance comparison matrix
  • Use case recommendations
  • Migration tools between drivers

Priority Rationale

  • Impact: Addresses specific GitHub requests, increases deployment options
  • Complexity: High (each driver is significant work)
  • Dependencies: None

Blockers

  • Maintenance Burden: Each driver requires ongoing support
  • Feature Parity: Different databases have different capabilities
  • Testing: Complex integration testing for each database

Success Metrics

  • 2+ new database drivers
  • Resolution of issues #1077, #947
  • Database migration tools

9. Documentation & Examples for Domain Adaptation 🟡 Priority: HIGH

Objective

Comprehensive documentation showing how to adapt Graphiti to different domains.

Implementation Plan

Phase 1: Domain Adaptation Guide

  • Step-by-step guide for domain customization
  • Decision tree for configuration choices
  • Best practices for each domain type

Phase 2: Complete Domain Examples

  • Scientific Research knowledge graph
  • Legal Document analysis
  • Financial Transaction network
  • IoT Event processing
  • Healthcare Records integration

Phase 3: Tutorial Series

  • Video walkthroughs
  • Interactive Jupyter notebooks
  • Code generation tools for domain setup

Priority Rationale

  • Impact: Critical for adoption in new domains
  • Complexity: Medium (requires domain expertise)
  • Dependencies: Depends on implementation of above directives

Blockers

  • Domain Expertise: Need experts for each domain
  • Maintenance: Examples need to stay current with codebase
  • Quality: Need real-world datasets and validation

Success Metrics

  • 5+ complete domain examples
  • Documentation coverage >80%
  • User-contributed domain examples

10. Testing & Evaluation Framework for Domains 🟢 Priority: MEDIUM

Objective

Create domain-specific test datasets and evaluation metrics.

Implementation Plan

Phase 1: Domain Test Datasets

  • Curate/generate test data for each domain
  • Include ground truth annotations
  • Support for evaluation benchmarks

Phase 2: Evaluation Metrics

  • Domain-specific quality metrics
  • Extraction accuracy measurements
  • Search relevance evaluation

Phase 3: Continuous Evaluation

  • Automated testing across domains
  • Performance regression detection
  • Quality dashboards

Priority Rationale

  • Impact: Ensures quality across domains
  • Complexity: Medium
  • Dependencies: Depends on domain implementations

Blockers

  • Data Acquisition: Domain datasets can be hard to obtain
  • Annotation: Ground truth annotation is expensive
  • Standardization: Metrics vary significantly by domain

Success Metrics

  • Test coverage >70% across domains
  • Automated evaluation pipeline
  • Public benchmark results

Implementation Roadmap

Phase 1: Foundation (Months 1-3)

Critical Infrastructure

  • Directive #1: Configurable Prompt System
  • Directive #3: Centralized Configuration Management
  • Directive #9: Initial documentation framework

Estimated Effort: 2-3 engineers, 3 months

Phase 2: Core Extensibility (Months 4-6)

Domain Adaptation

  • Directive #2: Pluggable NER Pipeline
  • Directive #4: Extensible Episode Types
  • Directive #7: Enhanced Metadata

Estimated Effort: 2-3 engineers, 3 months

Phase 3: Provider & Database Support (Months 7-9)

Infrastructure Expansion

  • Directive #6: Multi-Provider LLM Support
  • Directive #8: Database Provider Expansion (Phase 1)

Estimated Effort: 2 engineers, 3 months

Phase 4: Domain Optimization (Months 10-12)

Domain-Specific Features

  • Directive #5: Domain-Specific Search
  • Directive #10: Testing & Evaluation Framework
  • Directive #9: Complete domain examples

Estimated Effort: 2-3 engineers, 3 months


Risk Assessment

High Risk

  1. Breaking Changes: Refactoring may break existing integrations

    • Mitigation: Semantic versioning, deprecation warnings, migration guides
  2. Performance Regression: More abstraction may impact performance

    • Mitigation: Continuous benchmarking, performance budgets
  3. Complexity Creep: Too much configurability can confuse users

    • Mitigation: Sensible defaults, progressive disclosure, presets

Medium Risk

  1. Provider API Changes: External dependencies may change

    • Mitigation: Abstract interfaces, version pinning, adapter pattern
  2. Maintenance Burden: More features = more maintenance

    • Mitigation: Automated testing, clear ownership, deprecation policy
  3. Documentation Debt: Fast development may outpace docs

    • Mitigation: Docs-as-code, automated doc generation, examples as tests

Low Risk

  1. Community Adoption: Users may not need all domains
    • Mitigation: Modular architecture, optional components

Success Criteria

Technical Metrics

  • Zero hardcoded domain assumptions in core library
  • 5+ domain-specific configurations available
  • All GitHub issues (#1004, #1006, #1007, #1074, #995, #1077, #947, #961) resolved
  • Test coverage >75% across all domains
  • Performance within 10% of current baseline

User Experience Metrics

  • Domain setup time <30 minutes (from docs)
  • Config-driven customization (no code changes for 80% of use cases)
  • 3+ community-contributed domain adaptations

Business Metrics

  • Adoption in 3+ new domains (outside conversational AI)
  • 50%+ reduction in customization support requests
  • Documentation satisfaction >4.0/5.0

Appendix A: Affected Files

Core Files Requiring Changes

High Priority

  • graphiti_core/graphiti.py - Main class, initialization
  • graphiti_core/llm_client/config.py - Configuration system
  • graphiti_core/prompts/extract_nodes.py - NER prompts
  • graphiti_core/prompts/extract_edges.py - Relation extraction prompts
  • graphiti_core/utils/maintenance/node_operations.py - Extraction logic

Medium Priority

  • graphiti_core/nodes.py - Episode type definitions
  • graphiti_core/search/search_config.py - Search configuration
  • graphiti_core/search/search_config_recipes.py - Search recipes
  • server/graph_service/config.py - Server configuration

Low Priority

  • graphiti_core/driver/*.py - Database drivers
  • graphiti_core/embedder/*.py - Embedder clients

Directly Addressed

  • #1087: Embedding truncation
  • #1074: No results with Ollama embeddings
  • #1007: Unstable outputs with vLLM
  • #1006: AzureOpenAI reranker support
  • #1004: Azure OpenAI support
  • #995: Docker Azure OpenAI support
  • #1077: Google Cloud Spanner Graph support
  • #947: Apache AGE support
  • #961: Episodes API improvements
  • #1082: Neptune driver issues

Indirectly Improved

  • #1083: Orphaned entities cleanup
  • #1062: Stale data in MCP server
  • #1021: Incomplete graph structure
  • #1018: Search with group_ids
  • #1012: group_id and Anthropic issues
  • #992: OOM in build_communities
  • #963: Duplicate entities

Appendix C: Backward Compatibility Strategy

Deprecation Policy

  1. Feature Deprecation: 2 minor versions notice
  2. API Changes: Maintain old API with deprecation warnings
  3. Configuration: Support both old and new config formats during transition

Migration Support

  • Automated migration scripts for major changes
  • Detailed migration guides for each release
  • Migration validation tools

Version Support

  • LTS releases for enterprise users
  • Security patches for N-2 versions
  • Clear EOL policy

Next Steps

  1. Review & Approval: Circulate this plan for stakeholder feedback
  2. Prioritization: Finalize directive priorities based on business needs
  3. Resource Allocation: Assign engineering teams to Phase 1 directives
  4. Kickoff: Begin implementation of Directive #1 (Prompt System)

Document Maintainer: Claude (AI Assistant) Last Updated: 2025-11-30 Next Review: After Phase 1 completion