* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad. * Add multi-tenant testing strategy and ADR index documentation - Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details. - Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles. * feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise - Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints. - Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options. - Documented the LightRAG architecture, storage backends, LLM integrations, and query modes. - Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
16 KiB
16 KiB
ADR 001: Multi-Tenant, Multi-Knowledge-Base Architecture for LightRAG
Status: Proposed
Context
Current State
LightRAG is a retrieval-augmented generation system that currently operates as a single-instance system with basic workspace-level data isolation. The existing architecture uses:
- Workspace concept: Directory-based or database-field-based isolation for file/database storage
- Single LightRAG instance: One RAG system per server process, configured at startup
- Basic authentication: JWT tokens and API key support without tenant/knowledge-base awareness
- Shared configuration: All data uses the same LLM, embedding, and storage configurations
Limitations of Current Architecture
- No true multi-tenancy: Cannot serve multiple independent tenants securely
- No knowledge base isolation: All data belongs to a single knowledge base
- Shared compute resources: LLM and embedding calls are shared across all workspaces
- Static configuration: All tenants must use the same models and settings
- Cross-tenant data leak risk: Workspace isolation is not cryptographically enforced
- No resource quotas: No limits on storage, compute, or API usage per tenant
- Authentication limitations: JWT tokens don't support fine-grained access control
Existing Code Evidence
- Workspace in base.py:
StorageNameSpaceclass (line 176) includesworkspacefield for basic isolation - Namespace concept:
NameSpaceclass innamespace.pydefines storage categories but no tenant/KB concept - Storage implementations: Each storage type (PostgreSQL, JSON, Neo4j) implements workspace filtering:
PostgreSQLDBconstructor accepts workspace parameter (line 56 in postgres_impl.py)JsonKVStoragecreates workspace directories (line 30-39 in json_kv_impl.py)
- API configuration:
lightrag_server.pyaccepts--workspaceflag but no tenant/KB parameters - Authentication:
auth.pyprovides JWT tokens with roles but no tenant/KB scoping
Business Requirements
Organizations deploying LightRAG need to:
- Serve multiple independent customers (tenants) from a single instance
- Support multiple knowledge bases per tenant for different use cases
- Enforce complete data isolation between tenants
- Manage per-tenant resource quotas and billing
- Support per-tenant configuration (models, parameters, API keys)
- Provide audit trails and access logs per tenant
Decision
High-Level Architecture
Implement a multi-tenant, multi-knowledge-base (MT-MKB) architecture that:
- Adds tenant abstraction layer above the current workspace concept
- Introduces knowledge base concept as a first-class entity
- Implements tenant-aware routing at the API level
- Enforces data isolation through composite keys and access control
- Supports per-tenant/KB configuration for models and parameters
- Adds role-based access control (RBAC) for fine-grained permissions
Core Design Principles
- Backward Compatibility: Existing single-workspace setups continue to work
- Layered Isolation: Tenant > Knowledge Base > Document > Chunk/Entity
- Zero Trust: All data access requires explicit tenant/KB context
- Default Deny: Cross-tenant access is explicitly blocked unless authorized
- Audit Trail: All operations logged with tenant/KB context
- Resource Aware: Quotas and limits per tenant/KB
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ FastAPI Server (Single Instance) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ │ API Router │ │ Auth/Middleware │ │ Request Handler │
│ │ Layer │ │ (Tenant Extract) │ │ Layer │
│ └──────┬───────────┘ └──────┬───────────┘ └──────┬───────────┘
│ │ │ │
│ ┌──────▼──────────────────────▼──────────────────────▼──────┐
│ │ Tenant Context (TenantID + KnowledgeBaseID) │
│ │ Injected via Dependency Injection / Middleware │
│ └──────┬─────────────────────────────────────────────────────┘
│ │
│ ┌──────▼──────────────────────────────────────────────────────┐
│ │ Tenant-Aware LightRAG Instance Manager │
│ │ (Caches instances per tenant) │
│ └──────┬─────────────────────────────────────────────────────┘
│ │
│ ┌──────▼──────────────────────────────────────────────────────┐
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ │ Tenant 1 │ │ Tenant 2 │ │ Tenant N │ │
│ │ │ KB1, KB2 │ │ KB1, KB3 │ │ KB1, ... │ │
│ │ └─────────────┘ └─────────────┘ └──────────────┘ │
│ │ │
│ │ Multiple LightRAG Instances (per tenant or cached) │
│ └──────┬──────────────────────────────────────────────────────┘
│ │
│ ┌──────▼──────────────────────────────────────────────────────┐
│ │ Storage Access Layer with Tenant Filtering │
│ │ (Adds tenant/KB filters to all queries) │
│ └──────┬─────────────────────────────────────────────────────┘
│ │
│ ┌──────▼──────────────────────────────────────────────────────┐
│ │ │
│ │ ┌────────────────┐ ┌────────────┐ ┌────────────────┐ │
│ │ │ PostgreSQL │ │ Neo4j │ │ Redis/Milvus │ │
│ │ │ (Shared DB) │ │ (Shared) │ │ (Shared) │ │
│ │ └────────────────┘ └────────────┘ └────────────────┘ │
│ │ │
│ │ All queries filtered by tenant/KB at storage layer │
│ └────────────────────────────────────────────────────────────┘
│ │
└─────────────────────────────────────────────────────────────────┘
Key Components
1. Tenant Model
- TenantID: Unique identifier (UUID or slug)
- TenantName: Human-readable name
- Configuration: Per-tenant LLM, embedding, and rerank model configs
- ResourceQuotas: Storage, API calls, concurrent requests limits
- CreatedAt/UpdatedAt: Audit timestamps
2. Knowledge Base Model
- KnowledgeBaseID: Unique within tenant
- TenantID: Parent tenant reference
- KBName: Display name
- Description: Purpose and content overview
- Configuration: Per-KB indexing and query parameters
- Status: Active/Archived
- Metadata: Custom fields for tenant-specific data
3. Storage Isolation Strategy
All storage operations will include tenant/KB filters:
- Document storage:
workspace = f"{tenant_id}_{kb_id}" - Vector storage: Add
tenant_idandkb_idmetadata fields - Graph storage: Store tenant/KB info as node/edge attributes
- KV storage: Prefix keys with
tenant_id:kb_id:entity_id
4. API Routing
POST /api/v1/tenants/{tenant_id}/knowledge-bases/{kb_id}/documents/add
GET /api/v1/tenants/{tenant_id}/knowledge-bases/{kb_id}/documents/{doc_id}
POST /api/v1/tenants/{tenant_id}/knowledge-bases/{kb_id}/query
GET /api/v1/tenants/{tenant_id}/knowledge-bases/{kb_id}/graph
5. Authentication & Authorization
# JWT Token Payload
{
"sub": "user_id", # User identifier
"tenant_id": "tenant_uuid", # Assigned tenant
"knowledge_base_ids": ["kb1", "kb2"], # Accessible KBs
"role": "admin|editor|viewer", # Role within tenant
"exp": 1234567890, # Expiration
"permissions": {
"create_kb": true,
"delete_documents": true,
"run_queries": true
}
}
6. Dependency Injection for Tenant Context
# FastAPI dependency to extract and validate tenant context
async def get_tenant_context(
tenant_id: str,
kb_id: str,
token: str = Depends(get_auth_token)
) -> TenantContext:
# Verify user can access this tenant/KB
# Return validated context object
pass
Consequences
Positive
- True Multi-Tenancy: Complete data isolation between tenants
- Scalability: Support hundreds of tenants in single instance
- Cost Efficiency: Shared infrastructure reduces per-tenant costs
- Flexibility: Per-tenant model and parameter configuration
- Security: Fine-grained access control and audit trails
- Resource Management: Per-tenant quotas prevent resource abuse
- Operational Simplicity: Single instance to manage
Negative/Tradeoffs
- Increased Complexity: More code, more testing required (~2-3x development effort)
- Performance Overhead: Tenant/KB filtering on every query (~5-10% latency impact)
- Storage Overhead: Tenant/KB metadata increases storage footprint (~2-3%)
- Operational Complexity: More configuration options, training needed
- Breaking Changes: API endpoints change, requires migration scripts
- Backward Compatibility: Existing workspaces need migration strategy
Security Considerations
- Data Isolation: Tenant-aware queries prevent cross-tenant leaks
- Authentication: JWT tokens must include tenant scope
- Authorization: RBAC prevents unauthorized access to KBs
- Audit Trail: All operations logged for compliance
- Key Management: Per-tenant API keys need separate management
- Potential Vulnerabilities:
- Parameter injection in tenant/KB IDs (mitigate: strict validation)
- JWT token hijacking (mitigate: short expiry, rate limiting)
- Side-channel attacks via timing (mitigate: constant-time comparisons)
- Resource exhaustion (mitigate: quotas and rate limiting)
Performance Impact
- Query Latency: +5-10% from additional filtering
- Storage Size: +2-3% for tenant/KB metadata
- Memory Usage: +20-30% from maintaining multiple LightRAG instances
- CPU Usage: +10-15% from authentication/authorization checks
Migration Path for Existing Deployments
- Phase 1: Deploy with backward compatibility (single tenant = existing workspace)
- Phase 2: Provide migration script to convert workspaces to tenants
- Phase 3: Support hybrid mode (legacy workspaces + new tenants)
- Phase 4: Deprecate workspace mode in favor of tenant mode
Implementation Plan (Summary)
See 002-implementation-strategy.md for detailed step-by-step implementation guide.
High-Level Phases
-
Phase 1 (2-3 weeks): Core infrastructure
- Database schema changes
- Tenant/KB models
- Storage access layer updates
-
Phase 2 (2-3 weeks): API layer
- Tenant-aware routing
- Request/response models
- Authentication/authorization
-
Phase 3 (1-2 weeks): LightRAG integration
- Instance manager
- Per-tenant configurations
- Query execution
-
Phase 4 (1 week): Testing & deployment
- Unit/integration tests
- Migration scripts
- Documentation
Alternatives Considered
1. Separate Database Per Tenant
- Approach: Each tenant gets its own database/storage instance
- Rejected because:
- Massive operational overhead (n×database connections, backups, upgrades)
- Expensive (n×database licensing)
- Complex to manage tenants across instances
- Makes sharing resources impossible
2. Dedicated Server Instance Per Tenant
- Approach: Each tenant runs their own LightRAG instance
- Rejected because:
- Massive resource waste (minimum resources per instance)
- Very expensive at scale (n×server costs)
- Difficult to manage and monitor
- Cannot share LLM/embedding infrastructure
3. Simple Workspace Extension
- Approach: Just rename "workspace" to "tenant"
- Rejected because:
- No knowledge base concept (multiple KB per tenant fails)
- Cannot enforce cross-tenant access prevention
- No RBAC or fine-grained permissions
- Cannot manage per-tenant configuration
- No resource quotas
4. Sharding by Tenant Hash
- Approach: Hash tenant ID to determine shard, send queries to correct shard
- Rejected because:
- Breaks operational simplicity (multiple instances to manage)
- Rebalancing is complex when adding/removing tenants
- Doesn't reduce resource overhead
Evidence/References
Code References
- Storage base class:
lightrag/base.py:176-185(StorageNameSpace) - Namespace constants:
lightrag/namespace.py(NameSpace class) - Workspace implementation:
lightrag/kg/json_kv_impl.py:28-39(JsonKVStorage) - PostgreSQL workspace support:
lightrag/kg/postgres_impl.py:44-59 - API server architecture:
lightrag/api/lightrag_server.py:1-300 - Authentication:
lightrag/api/auth.py(JWT token management) - Config:
lightrag/api/config.py:200-220(workspace argument)
Related Documentation
- Current workspace isolation documented in
lightrag/api/README-zh.md:165-173 - Storage implementations in
lightrag/kg/directory
Next Steps
- Review and approve this ADR
- Create detailed design documents for each component (see ADR 002-007)
- Conduct security review of proposed architecture
- Estimate development effort and allocate resources
- Create implementation tickets and sprint planning
Document Version: 1.0
Last Updated: 2025-11-20
Author: Architecture Design Process
Status: Proposed - Awaiting Review and Approval