graphiti/DOMAIN_AGNOSTIC_IMPROVEMENT_PLAN.md
supmo668 74a422369c feat: Add enhanced configuration system with multi-provider LLM support
This commit introduces a comprehensive configuration system that makes
Graphiti more flexible and easier to configure across different
providers and deployment environments.

## New Features

- **Unified Configuration**: New GraphitiConfig class with Pydantic validation
- **YAML Support**: Load configuration from .graphiti.yaml files
- **Multi-Provider Support**: Easy switching between OpenAI, Azure, Anthropic,
  Gemini, Groq, and LiteLLM
- **LiteLLM Integration**: Unified access to 100+ LLM providers
- **Factory Functions**: Automatic client creation from configuration
- **Full Backward Compatibility**: Existing code continues to work

## Configuration System

- graphiti_core/config/settings.py: Pydantic configuration classes
- graphiti_core/config/providers.py: Provider enumerations and defaults
- graphiti_core/config/factory.py: Factory functions for client creation

## LiteLLM Client

- graphiti_core/llm_client/litellm_client.py: New unified LLM client
- Support for Azure OpenAI, AWS Bedrock, Vertex AI, Ollama, vLLM, etc.
- Automatic structured output detection

## Documentation

- docs/CONFIGURATION.md: Comprehensive configuration guide
- examples/graphiti_config_example.yaml: Example configurations
- DOMAIN_AGNOSTIC_IMPROVEMENT_PLAN.md: Future improvement roadmap

## Tests

- tests/config/test_settings.py: 22 tests for configuration
- tests/config/test_factory.py: 12 tests for factories
- 33/34 tests passing (97%)

## Issues Addressed

- #1004: Azure OpenAI support
- #1006: Azure OpenAI reranker support
- #1007: vLLM/OpenAI-compatible provider stability
- #1074: Ollama embeddings support
- #995: Docker Azure OpenAI support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 23:47:38 -08:00

749 lines
23 KiB
Markdown

# Graphiti Domain-Agnostic Improvement Plan
**Date**: 2025-11-30
**Status**: Draft
**Last Pull**: `422558d` (main branch)
---
## Executive Summary
This document outlines a strategic plan to make Graphiti more domain-agnostic and adaptable to diverse use cases beyond conversational AI. The current architecture, while powerful, contains several domain-specific assumptions (primarily around messaging/conversational data) that limit its applicability to other domains such as scientific research, legal documents, IoT data, healthcare records, financial transactions, etc.
---
## Current Architecture Analysis
### Key Components Review
1. **NER & Entity Extraction** (`graphiti_core/utils/maintenance/node_operations.py`, `graphiti_core/prompts/extract_nodes.py`)
- Hardcoded prompts for three episode types: message, text, JSON
- Domain-specific language (e.g., "speaker", "conversation")
- Entity type classification tightly coupled with extraction logic
2. **LLM Client Configuration** (`graphiti_core/llm_client/config.py`, `graphiti_core/graphiti.py`)
- Defaults to OpenAI across all components
- No centralized model selection strategy
- Temperature (1.0) and max_tokens (8192) hardcoded as defaults
3. **Episode Types** (`graphiti_core/nodes.py`)
- Limited to: message, text, JSON
- Each type requires separate prompt functions
- No extensibility mechanism for custom episode types
4. **Prompt System** (`graphiti_core/prompts/`)
- Prompts are Python functions, not configurable data
- No template engine or override mechanism
- Domain assumptions embedded in prompt text
5. **Search & Retrieval** (`graphiti_core/search/`)
- Flexible but complex configuration
- Limited domain-specific search recipes
- No semantic domain adapters
---
## Identified Issues from GitHub (Top 20)
### High-Impact Issues Related to Domain Agnostic Goals:
1. **#1087**: Embedding truncation reduces retrieval quality for text-embedding-3-small
2. **#1074**: Neo4j quickstart returns no results with OpenAI-compatible LLM + Ollama embeddings
3. **#1007**: OpenAIGenericClient outputs unstable for vllm serving gpt-oss-20b
4. **#1006**: OpenAIRerankerClient does not support AzureOpenAILLMClient
5. **#1004**: Azure OpenAI is not supported
6. **#995**: Docker container does not support Azure OpenAI
7. **#1077**: Support for Google Cloud Spanner Graph
8. **#947**: Support for Apache AGE as Graph DB
9. **#1016**: Support episode vector
10. **#961**: Improve Episodes API - return UUID, support GET by ID, custom metadata
---
## Improvement Directives
### 1. **Configurable Prompt System** 🔴 **Priority: CRITICAL**
#### Objective
Replace hardcoded prompt functions with a templatable, extensible prompt system that supports domain customization.
#### Implementation Plan
**Phase 1: Prompt Template Engine**
- Create `PromptTemplate` class with variable interpolation
- Support multiple template formats (Jinja2, mustache, or custom)
- Add prompt registry for registration and lookup
```python
# Example API
class PromptTemplate:
def __init__(self, template: str, variables: dict[str, str]):
self.template = template
self.variables = variables
def render(self, context: dict[str, Any]) -> str:
# Template rendering logic
pass
class PromptRegistry:
def register(self, name: str, template: PromptTemplate) -> None:
pass
def get(self, name: str) -> PromptTemplate:
pass
def override(self, name: str, template: PromptTemplate) -> None:
pass
```
**Phase 2: Refactor Existing Prompts**
- Convert all prompt functions in `graphiti_core/prompts/` to templates
- Maintain backward compatibility with existing API
- Add domain-specific prompt overrides
**Phase 3: Documentation & Examples**
- Create prompt customization guide
- Provide domain-specific examples (legal, scientific, financial)
- Add prompt testing utilities
#### Priority Rationale
- **Impact**: Enables all domain customization downstream
- **Complexity**: Medium - requires careful refactoring
- **Dependencies**: None - can be done independently
#### Blockers
- **Breaking Changes**: Need to maintain backward compatibility
- **LLM Provider Compatibility**: Different providers may require different prompt formats
- **Testing**: Need comprehensive test suite for prompt variations
#### Success Metrics
- Users can customize prompts without code changes
- 5+ domain-specific prompt examples documented
- No regression in existing use cases
---
### 2. **Pluggable NER & Entity Extraction Pipeline** 🔴 **Priority: CRITICAL**
#### Objective
Make the entity extraction pipeline modular and extensible for different domain requirements.
#### Implementation Plan
**Phase 1: Extraction Strategy Interface**
- Define `ExtractionStrategy` protocol/abstract class
- Support custom entity extractors (LLM-based, rule-based, hybrid)
- Allow domain-specific entity type systems
```python
class ExtractionStrategy(Protocol):
async def extract_entities(
self,
episode: EpisodicNode,
context: dict[str, Any],
entity_types: dict[str, type[BaseModel]] | None = None,
) -> list[EntityNode]:
...
async def extract_relations(
self,
episode: EpisodicNode,
entities: list[EntityNode],
context: dict[str, Any],
) -> list[EntityEdge]:
...
```
**Phase 2: Domain-Specific Extractors**
- Create extractors for common domains:
- `ScientificPaperExtractor`: Extracts researchers, institutions, findings, citations
- `LegalDocumentExtractor`: Extracts parties, cases, statutes, precedents
- `FinancialExtractor`: Extracts companies, transactions, indicators
- `IoTEventExtractor`: Extracts devices, sensors, readings, locations
- `HealthcareExtractor`: Extracts patients, conditions, treatments, providers
**Phase 3: Extractor Composition**
- Allow chaining multiple extractors
- Support fallback strategies
- Enable parallel extraction with merging
#### Priority Rationale
- **Impact**: Directly addresses domain specificity in core extraction
- **Complexity**: High - touches critical path
- **Dependencies**: Depends on Directive #1 (prompts)
#### Blockers
- **Performance**: Multiple extractors may impact latency
- **Conflict Resolution**: Different extractors may produce conflicting entities
- **Schema Validation**: Need flexible validation for diverse entity types
#### Success Metrics
- 3+ domain-specific extractors implemented
- 50%+ reduction in domain customization code
- No performance degradation for default use case
---
### 3. **Centralized Configuration Management** 🟡 **Priority: HIGH**
#### Objective
Create a unified configuration system for LLM clients, embedders, and other components.
#### Implementation Plan
**Phase 1: Configuration Schema**
- Create `GraphitiConfig` with hierarchical structure
- Support environment variables, config files (YAML/TOML), and programmatic config
- Add validation with Pydantic
```python
class LLMProviderConfig(BaseModel):
provider: Literal["openai", "anthropic", "gemini", "groq", "custom"]
model: str
small_model: str | None = None
api_key: str | None = None
base_url: str | None = None
temperature: float = 1.0
max_tokens: int = 8192
class EmbedderConfig(BaseModel):
provider: Literal["openai", "voyage", "gemini", "custom"]
model: str
api_key: str | None = None
embedding_dim: int | None = None
class GraphitiConfig(BaseModel):
llm: LLMProviderConfig
embedder: EmbedderConfig
database: DatabaseConfig
extraction: ExtractionConfig
search: SearchConfig
```
**Phase 2: Config Loading & Merging**
- Support config file discovery (`.graphiti.yaml`, `graphiti.config.toml`)
- Merge configs from multiple sources (file < env < code)
- Add config validation and helpful error messages
**Phase 3: Domain-Specific Presets**
- Create preset configs for common use cases
- Support config inheritance and composition
```yaml
# Example: .graphiti.yaml
extends: "presets/scientific-research"
llm:
provider: anthropic
model: claude-sonnet-4-5-latest
temperature: 0.3
extraction:
entity_types:
- Researcher
- Institution
- Finding
- Methodology
extractors:
- type: llm
prompt: prompts/scientific_entities.yaml
- type: regex
patterns: prompts/scientific_patterns.yaml
```
#### Priority Rationale
- **Impact**: Simplifies deployment and customization
- **Complexity**: Medium
- **Dependencies**: None
#### Blockers
- **Backward Compatibility**: Must support existing initialization patterns
- **Security**: API keys and credentials management
- **Validation**: Complex validation rules across providers
#### Success Metrics
- Single config file for complete setup
- Zero hardcoded defaults in core code
- 10+ domain preset configs available
---
### 4. **Extensible Episode Type System** 🟡 **Priority: HIGH**
#### Objective
Allow users to define custom episode types with associated extraction logic.
#### Implementation Plan
**Phase 1: Episode Type Registry**
- Create `EpisodeTypeRegistry` for dynamic episode types
- Support custom episode type definitions with Pydantic
```python
class EpisodeTypeDefinition(BaseModel):
name: str
description: str
content_schema: type[BaseModel] | None = None
extraction_strategy: str | ExtractionStrategy
prompt_template: str | None = None
class EpisodeTypeRegistry:
def register(self, episode_type: EpisodeTypeDefinition) -> None:
pass
def get(self, name: str) -> EpisodeTypeDefinition:
pass
```
**Phase 2: Dynamic Dispatch**
- Modify `extract_nodes()` to dispatch based on episode type
- Support fallback to default extraction for undefined types
**Phase 3: Common Episode Types**
- Provide built-in types for common domains:
- `scientific_paper`
- `legal_document`
- `financial_report`
- `iot_event`
- `healthcare_record`
- `email`
- `api_log`
#### Priority Rationale
- **Impact**: Removes major extensibility bottleneck
- **Complexity**: Medium
- **Dependencies**: Depends on Directive #2 (extractors)
#### Blockers
- **Type Safety**: Ensuring type safety with dynamic types
- **Validation**: Schema validation for custom content
- **Migration**: Migrating existing message/text/JSON types
#### Success Metrics
- Users can add episode types without code changes
- 5+ built-in episode types for different domains
- Clear migration path from existing types
---
### 5. **Domain-Specific Search Strategies** 🟢 **Priority: MEDIUM**
#### Objective
Provide domain-optimized search configurations and strategies.
#### Implementation Plan
**Phase 1: Search Strategy Templates**
- Create domain-specific search configs in `search_config_recipes.py`
- Optimize for domain characteristics (e.g., temporal for financial, spatial for IoT)
```python
# Examples
FINANCIAL_TEMPORAL_SEARCH = SearchConfig(
edge_config=EdgeSearchConfig(
search_methods=[
EdgeSearchMethod.cosine_similarity,
EdgeSearchMethod.bm25,
],
reranker=EdgeReranker.episode_mentions,
),
# Prioritize recent events
# ... domain-specific configuration
)
SCIENTIFIC_CITATION_SEARCH = SearchConfig(
# Optimize for citation networks
# ... domain-specific configuration
)
```
**Phase 2: Semantic Domain Adapters**
- Create domain-specific query expansion
- Add domain vocabulary mapping
- Support domain-specific relevance scoring
**Phase 3: Search Analytics**
- Track search performance by domain
- Provide domain-specific search insights
- Auto-tune search configs based on usage
#### Priority Rationale
- **Impact**: Improves search quality for specific domains
- **Complexity**: Low-Medium
- **Dependencies**: None - additive feature
#### Blockers
- **Domain Expertise**: Requires deep understanding of each domain
- **Evaluation**: Need domain-specific test datasets
- **Maintenance**: Each domain strategy needs ongoing optimization
#### Success Metrics
- 5+ domain-optimized search strategies
- Measurable improvement in domain-specific retrieval quality
- Search strategy recommendation system
---
### 6. **Multi-Provider LLM & Embedder Support Enhancement** 🟢 **Priority: MEDIUM**
#### Objective
Improve support for diverse LLM and embedding providers, addressing current issues with Azure, Anthropic, and local models.
#### Implementation Plan
**Phase 1: Provider Abstraction Improvements**
- Enhance `LLMClient` interface for provider-specific features
- Better handling of structured output across providers (#1007)
- Unified error handling and retries
**Phase 2: Provider-Specific Optimizations**
- Azure OpenAI full support (#1004, #995, #1006)
- Anthropic optimization for structured output
- Local model support (Ollama, vLLM) (#1074, #1007)
- Google Cloud Vertex AI integration
**Phase 3: Embedder Flexibility**
- Support mixed embedding strategies (different models for nodes vs edges)
- Domain-specific embedding fine-tuning
- Embedding dimension adaptation (#1087)
#### Priority Rationale
- **Impact**: Addresses multiple GitHub issues, improves flexibility
- **Complexity**: Medium-High (provider-specific quirks)
- **Dependencies**: Related to Directive #3 (config)
#### Blockers
- **Provider API Changes**: External dependencies on provider APIs
- **Testing**: Requires access to multiple provider accounts
- **Cost**: Testing across providers can be expensive
#### Success Metrics
- All providers in CLAUDE.md fully supported
- Resolution of issues #1004, #1006, #1007, #1074, #995
- Provider switching with zero code changes
---
### 7. **Enhanced Metadata & Custom Attributes** 🟢 **Priority: MEDIUM**
#### Objective
Support domain-specific metadata on all graph elements (nodes, edges, episodes).
#### Implementation Plan
**Phase 1: Flexible Metadata Schema**
- Add `custom_metadata: dict[str, Any]` to all core types
- Support typed metadata with Pydantic models
- Index metadata for searchability
**Phase 2: Domain-Specific Attributes**
- Support custom attributes per domain
- Attribute extraction from episodes
- Attribute-based filtering in search
**Phase 3: Metadata API Improvements**
- Episode API enhancements (#961)
- Metadata update operations
- Bulk metadata operations
#### Priority Rationale
- **Impact**: Enables rich domain modeling
- **Complexity**: Low-Medium
- **Dependencies**: Database schema changes
#### Blockers
- **Schema Migration**: Existing graphs need migration
- **Index Performance**: Metadata indexing may impact performance
- **Validation**: Complex validation for diverse metadata
#### Success Metrics
- Custom metadata on all graph elements
- Metadata-based search and filtering
- Resolution of issue #961
---
### 8. **Database Provider Expansion** 🔵 **Priority: LOW**
#### Objective
Support additional graph databases to meet diverse deployment requirements.
#### Implementation Plan
**Phase 1: Abstract Driver Interface**
- Enhance `GraphDriver` abstraction
- Standardize query translation layer
- Support for property graph vs RDF models
**Phase 2: New Drivers**
- Google Cloud Spanner Graph (#1077)
- Apache AGE (#947)
- Amazon Neptune improvements (#1082)
- TigerGraph, NebulaGraph
**Phase 3: Driver Selection Guide**
- Performance comparison matrix
- Use case recommendations
- Migration tools between drivers
#### Priority Rationale
- **Impact**: Addresses specific GitHub requests, increases deployment options
- **Complexity**: High (each driver is significant work)
- **Dependencies**: None
#### Blockers
- **Maintenance Burden**: Each driver requires ongoing support
- **Feature Parity**: Different databases have different capabilities
- **Testing**: Complex integration testing for each database
#### Success Metrics
- 2+ new database drivers
- Resolution of issues #1077, #947
- Database migration tools
---
### 9. **Documentation & Examples for Domain Adaptation** 🟡 **Priority: HIGH**
#### Objective
Comprehensive documentation showing how to adapt Graphiti to different domains.
#### Implementation Plan
**Phase 1: Domain Adaptation Guide**
- Step-by-step guide for domain customization
- Decision tree for configuration choices
- Best practices for each domain type
**Phase 2: Complete Domain Examples**
- Scientific Research knowledge graph
- Legal Document analysis
- Financial Transaction network
- IoT Event processing
- Healthcare Records integration
**Phase 3: Tutorial Series**
- Video walkthroughs
- Interactive Jupyter notebooks
- Code generation tools for domain setup
#### Priority Rationale
- **Impact**: Critical for adoption in new domains
- **Complexity**: Medium (requires domain expertise)
- **Dependencies**: Depends on implementation of above directives
#### Blockers
- **Domain Expertise**: Need experts for each domain
- **Maintenance**: Examples need to stay current with codebase
- **Quality**: Need real-world datasets and validation
#### Success Metrics
- 5+ complete domain examples
- Documentation coverage >80%
- User-contributed domain examples
---
### 10. **Testing & Evaluation Framework for Domains** 🟢 **Priority: MEDIUM**
#### Objective
Create domain-specific test datasets and evaluation metrics.
#### Implementation Plan
**Phase 1: Domain Test Datasets**
- Curate/generate test data for each domain
- Include ground truth annotations
- Support for evaluation benchmarks
**Phase 2: Evaluation Metrics**
- Domain-specific quality metrics
- Extraction accuracy measurements
- Search relevance evaluation
**Phase 3: Continuous Evaluation**
- Automated testing across domains
- Performance regression detection
- Quality dashboards
#### Priority Rationale
- **Impact**: Ensures quality across domains
- **Complexity**: Medium
- **Dependencies**: Depends on domain implementations
#### Blockers
- **Data Acquisition**: Domain datasets can be hard to obtain
- **Annotation**: Ground truth annotation is expensive
- **Standardization**: Metrics vary significantly by domain
#### Success Metrics
- Test coverage >70% across domains
- Automated evaluation pipeline
- Public benchmark results
---
## Implementation Roadmap
### Phase 1: Foundation (Months 1-3)
**Critical Infrastructure**
- [ ] Directive #1: Configurable Prompt System
- [ ] Directive #3: Centralized Configuration Management
- [ ] Directive #9: Initial documentation framework
**Estimated Effort**: 2-3 engineers, 3 months
### Phase 2: Core Extensibility (Months 4-6)
**Domain Adaptation**
- [ ] Directive #2: Pluggable NER Pipeline
- [ ] Directive #4: Extensible Episode Types
- [ ] Directive #7: Enhanced Metadata
**Estimated Effort**: 2-3 engineers, 3 months
### Phase 3: Provider & Database Support (Months 7-9)
**Infrastructure Expansion**
- [ ] Directive #6: Multi-Provider LLM Support
- [ ] Directive #8: Database Provider Expansion (Phase 1)
**Estimated Effort**: 2 engineers, 3 months
### Phase 4: Domain Optimization (Months 10-12)
**Domain-Specific Features**
- [ ] Directive #5: Domain-Specific Search
- [ ] Directive #10: Testing & Evaluation Framework
- [ ] Directive #9: Complete domain examples
**Estimated Effort**: 2-3 engineers, 3 months
---
## Risk Assessment
### High Risk
1. **Breaking Changes**: Refactoring may break existing integrations
- *Mitigation*: Semantic versioning, deprecation warnings, migration guides
2. **Performance Regression**: More abstraction may impact performance
- *Mitigation*: Continuous benchmarking, performance budgets
3. **Complexity Creep**: Too much configurability can confuse users
- *Mitigation*: Sensible defaults, progressive disclosure, presets
### Medium Risk
1. **Provider API Changes**: External dependencies may change
- *Mitigation*: Abstract interfaces, version pinning, adapter pattern
2. **Maintenance Burden**: More features = more maintenance
- *Mitigation*: Automated testing, clear ownership, deprecation policy
3. **Documentation Debt**: Fast development may outpace docs
- *Mitigation*: Docs-as-code, automated doc generation, examples as tests
### Low Risk
1. **Community Adoption**: Users may not need all domains
- *Mitigation*: Modular architecture, optional components
---
## Success Criteria
### Technical Metrics
- [ ] Zero hardcoded domain assumptions in core library
- [ ] 5+ domain-specific configurations available
- [ ] All GitHub issues (#1004, #1006, #1007, #1074, #995, #1077, #947, #961) resolved
- [ ] Test coverage >75% across all domains
- [ ] Performance within 10% of current baseline
### User Experience Metrics
- [ ] Domain setup time <30 minutes (from docs)
- [ ] Config-driven customization (no code changes for 80% of use cases)
- [ ] 3+ community-contributed domain adaptations
### Business Metrics
- [ ] Adoption in 3+ new domains (outside conversational AI)
- [ ] 50%+ reduction in customization support requests
- [ ] Documentation satisfaction >4.0/5.0
---
## Appendix A: Affected Files
### Core Files Requiring Changes
**High Priority**
- `graphiti_core/graphiti.py` - Main class, initialization
- `graphiti_core/llm_client/config.py` - Configuration system
- `graphiti_core/prompts/extract_nodes.py` - NER prompts
- `graphiti_core/prompts/extract_edges.py` - Relation extraction prompts
- `graphiti_core/utils/maintenance/node_operations.py` - Extraction logic
**Medium Priority**
- `graphiti_core/nodes.py` - Episode type definitions
- `graphiti_core/search/search_config.py` - Search configuration
- `graphiti_core/search/search_config_recipes.py` - Search recipes
- `server/graph_service/config.py` - Server configuration
**Low Priority**
- `graphiti_core/driver/*.py` - Database drivers
- `graphiti_core/embedder/*.py` - Embedder clients
---
## Appendix B: Related GitHub Issues
### Directly Addressed
- #1087: Embedding truncation
- #1074: No results with Ollama embeddings
- #1007: Unstable outputs with vLLM
- #1006: AzureOpenAI reranker support
- #1004: Azure OpenAI support
- #995: Docker Azure OpenAI support
- #1077: Google Cloud Spanner Graph support
- #947: Apache AGE support
- #961: Episodes API improvements
- #1082: Neptune driver issues
### Indirectly Improved
- #1083: Orphaned entities cleanup
- #1062: Stale data in MCP server
- #1021: Incomplete graph structure
- #1018: Search with group_ids
- #1012: group_id and Anthropic issues
- #992: OOM in build_communities
- #963: Duplicate entities
---
## Appendix C: Backward Compatibility Strategy
### Deprecation Policy
1. **Feature Deprecation**: 2 minor versions notice
2. **API Changes**: Maintain old API with deprecation warnings
3. **Configuration**: Support both old and new config formats during transition
### Migration Support
- Automated migration scripts for major changes
- Detailed migration guides for each release
- Migration validation tools
### Version Support
- LTS releases for enterprise users
- Security patches for N-2 versions
- Clear EOL policy
---
## Next Steps
1. **Review & Approval**: Circulate this plan for stakeholder feedback
2. **Prioritization**: Finalize directive priorities based on business needs
3. **Resource Allocation**: Assign engineering teams to Phase 1 directives
4. **Kickoff**: Begin implementation of Directive #1 (Prompt System)
---
**Document Maintainer**: Claude (AI Assistant)
**Last Updated**: 2025-11-30
**Next Review**: After Phase 1 completion