History

Raphael MANSUY 2b292d4924 docs: Enterprise Edition & Multi-tenancy attribution (#5 ) * Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad. * Add multi-tenant testing strategy and ADR index documentation - Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details. - Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles. * feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise - Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints. - Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options. - Documented the LightRAG architecture, storage backends, LLM integrations, and query modes. - Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.		2025-12-04 18:09:15 +08:00
..
001-multi-tenant-architecture-overview.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
002-implementation-strategy.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
003-data-models-and-storage.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
004-api-design.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
005-security-analysis.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
006-architecture-diagrams-alternatives.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
007-deployment-guide-quick-reference.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
008-multi-tenant-testing-strategy.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00
README.md	docs: Enterprise Edition & Multi-tenancy attribution (#5 )	2025-12-04 18:09:15 +08:00

README.md

LightRAG Multi-Tenant Architecture - Complete ADR Index

Document Overview

This collection of 7 Architecture Decision Records provides comprehensive guidance for implementing a multi-tenant, multi-knowledge-base system in LightRAG. All recommendations are grounded in actual codebase analysis and include detailed implementation specifications.

📋 Complete Document Index

ADR 001: Multi-Tenant Architecture Overview

Purpose: Establish the core architectural decision and rationale
Length: ~400 lines
Key Sections:

Current state analysis (single-instance, workspace-level isolation)
Architectural decision (multi-tenant with per-KB scoping)
Consequences (complexity, performance, security trade-offs)
Code evidence (6 direct references to existing patterns)
Alternative approaches evaluated (4 alternatives considered)

When to Read: First - understand why multi-tenant is necessary
For Roles: Architects, Tech Leads, Decision Makers
Decision Status: Proposed (Ready for stakeholder approval)

ADR 002: Implementation Strategy

Purpose: Detailed roadmap for implementation across 4 phases
Length: ~800 lines
Key Sections:

Phase 1 (2-3 weeks): Database schema, tenant models, core infrastructure
Phase 2 (2-3 weeks): API layer, tenant routing, permission checking
Phase 3 (1-2 weeks): LightRAG integration, instance caching, query modification
Phase 4 (1 week): Testing, migration, deployment
Configuration examples with real environment variables
Performance targets and success metrics
Known limitations and future work

Total Effort: ~160 developer hours across 4 weeks
When to Read: Second - use for sprint planning and task breakdown
For Roles: Engineering Leads, Project Managers, Developers
Implementation Detail: High-level code examples (not pseudo-code)

ADR 003: Data Models and Storage Design

Purpose: Complete specification of data models and storage schema
Length: ~700 lines
Key Sections:

Core data models with Python dataclass definitions
PostgreSQL schema with 8 tables, composite indexes, and migration scripts
Neo4j schema with Cypher examples
MongoDB/Vector DB schema with partition strategies
Access control lists and role-based permissions
Data validation rules and constraints
Backward compatibility mapping for workspace-to-tenant migration

When to Read: Before database migration work begins
For Roles: Database Engineers, Backend Developers
Schema Completeness: 100% (Production-ready SQL)

ADR 004: API Design and Routing

Purpose: Complete REST API specification for multi-tenant system
Length: ~900 lines
Key Sections:

API versioning and base URL structure (/api/v1/tenants/{tenant_id}/...)
Authentication mechanisms (JWT RS256, API keys with rotation)
Tenant management endpoints (CRUD operations)
Knowledge base endpoints (lifecycle management)
Document endpoints (upload, status, deletion)
Query endpoints (standard, streaming, with data)
Error handling with 8 error codes and examples
Rate limiting configuration per tenant
10+ cURL examples for all operations
OpenAPI/Swagger documentation structure

Endpoint Count: 30+ endpoints defined
When to Read: Before API development begins
For Roles: API Developers, Frontend Engineers, QA
Specification Completeness: 100% (Ready to implement)

ADR 005: Security Analysis and Mitigation

Purpose: Comprehensive security analysis with threat modeling
Length: ~900 lines
Key Sections:

Security principles (Zero Trust, Defense in Depth, Complete Mediation)
Threat model with 7 attack vectors:
1. Unauthorized cross-tenant access → Dependency injection validation
2. Authentication bypass → Strong JWT signature verification
3. Parameter injection/path traversal → UUID validation + parameterized queries
4. Information disclosure → Generic errors + log sanitization
5. DoS via resource exhaustion → Per-tenant rate limits
6. Data leakage via logs → Field redaction + PII hashing
7. Replay attacks → JTI tracking + idempotency keys
JWT security configuration (RS256 recommended)
API key security (bcrypt hashing, rotation policy)
CORS and TLS/HTTPS configuration
Audit logging structure with 14 event types
Vulnerability scanning strategy
Compliance considerations (GDPR, SOC 2, ISO 27001, HIPAA)
Security checklist with 13 verification items

When to Read: Before security implementation phase
For Roles: Security Engineers, Backend Developers, Compliance Officers
Threat Coverage: Comprehensive (All major attack vectors)

ADR 006: Architecture Diagrams and Alternatives

Purpose: Visual representation of architecture and detailed alternatives analysis
Length: ~700 lines
Key Sections:

Full system architecture ASCII diagram (6 layers)
Query execution flow diagram (10 steps)
Document upload flow diagram (7 steps)
5 alternative approaches with pros/cons:
1. Database per Tenant (Rejected: 100x cost, operational nightmare)
2. Server per Tenant (Rejected: Resource waste, uneconomical)
3. Workspace Rename (Rejected: No KB isolation, weak security)
4. Shared Single Instance (Rejected: Data isolation risk too high)
5. Sharding by Hash (Rejected: Complexity without sufficient benefit)
Comparison matrix showing why proposed approach wins
Risk assessment for each alternative

When to Read: For architectural validation and decision support
For Roles: Architects, Tech Leads, Stakeholders
Visualization Quality: High (ASCII diagrams suitable for documentation/slides)

ADR 007: Deployment Guide and Quick Reference

Purpose: Practical guide for deployment, testing, and operations
Length: ~800 lines
Key Sections:

Quick start for developers (setup, testing, manual testing)
Docker Compose configuration for complete stack
Environment variable reference
Backward compatibility and migration from workspace model
Monitoring and observability setup
Prometheus queries for key metrics
Rollout strategy (4-phase soft launch to production)
Troubleshooting guide with solutions
Success criteria checklist
Support resources and documentation index

When to Read: During deployment and operational phases
For Roles: DevOps Engineers, Operators, Support Teams
Operational Readiness: Complete (All runbooks provided)

🎯 Reading Paths by Role

👨‍💼 For Executives/Product Managers

Executive Summary (this document, sections below)
ADR 001 - Sections: Decision, Consequences, Alternatives
ADR 002 - Sections: Timeline, Effort, Success Metrics
ADR 007 - Sections: Rollout Strategy, Success Criteria

Time Investment: 45 minutes
Key Takeaway: What we're building, why it matters, and when it ships

🏗️ For Architects/Tech Leads

ADR 001 - Complete
ADR 006 - Complete (diagrams + alternatives)
ADR 003 - Sections: Core Models, Storage Strategy
ADR 002 - Sections: Phase Overview, Configuration
ADR 005 - Sections: Threat Model, Security Checklist

Time Investment: 3 hours
Key Takeaway: Complete architectural vision with design justification

👨‍💻 For Developers (API/Backend)

ADR 002 - Complete (detailed code examples)
ADR 004 - Complete (endpoint specifications)
ADR 003 - Sections: Core Models, PostgreSQL Schema
ADR 005 - Sections: Threat Mitigations (code-level)
ADR 007 - Sections: Quick Start, Testing

Time Investment: 6 hours
Key Takeaway: Exact code changes needed, APIs to implement, test strategy

🔐 For Security/DevOps

ADR 005 - Complete (threat model, mitigations, compliance)
ADR 007 - Complete (monitoring, troubleshooting)
ADR 004 - Sections: Authentication, Error Handling
ADR 002 - Sections: Configuration, Testing
ADR 001 - Sections: Consequences (security)

Time Investment: 4 hours
Key Takeaway: Security architecture, deployment checklist, monitoring strategy

📊 For Database Engineers

ADR 003 - Complete
ADR 002 - Sections: Phase 1 (Database changes)
ADR 001 - Sections: Current Architecture
ADR 005 - Sections: Parameter Injection Mitigation

Time Investment: 4 hours
Key Takeaway: Schema changes, migration scripts, storage isolation strategy

📌 Executive Summary

The Opportunity

LightRAG currently supports single-instance deployments with basic workspace-level isolation. To serve multiple organizations and knowledge domains (SaaS model), we need true multi-tenancy with knowledge base-level isolation.

The Decision

Implement multi-tenant architecture with multi-knowledge-base support using:

Tenant abstraction layer (UUID-based isolation)
Knowledge bases as first-class entities
Composite key strategy (tenant_id:kb_id:entity_id)
Storage layer automatic filtering (defense in depth)
Per-tenant RAG instance caching (performance optimization)

Investment Required

Effort: ~160 developer-hours
Timeline: 4 weeks (1 week per phase)
Team Size: 4 developers + 1 tech lead
Infrastructure: Database migration, Redis for caching

Business Impact

Enables: Multi-customer SaaS model
Reduces: Per-customer hosting costs by 10-50x
Improves: Data isolation and security posture
Provides: RBAC and audit logging for compliance
Supports: Future expansion to 100+ concurrent tenants

Risk Assessment

Risk	Severity	Mitigation
Cross-tenant data access	Critical	Defense-in-depth filters + automated tests
Performance degradation	High	Instance caching, indexed queries, monitoring
Migration failures	Medium	Dual-write period, rollback plan, testing
Operational complexity	Medium	Comprehensive monitoring, runbooks, training

Success Metrics

✓ Functional: All API endpoints working with tenant isolation
✓ Security: Zero cross-tenant data access in production
✓ Performance: Query latency < 200ms p99, cache hit rate > 90%
✓ Operational: 99.5% uptime, <5min incident response time
✓ Business: Support 50+ active tenants on single instance

🚀 Quick Implementation Checklist

Pre-Implementation (Week 0)

Review all 7 ADRs with team (30-45 minutes)
Secure stakeholder approval
Create detailed Jira tickets from ADR 002
Set up development databases (PostgreSQL, Redis)
Brief security team on threat model (ADR 005)

Phase 1: Core Infrastructure (Week 1-2)

Create database schema (ADR 003)
Implement tenant models (dataclasses)
Create TenantService for CRUD
Add tenant/KB columns to storage base classes
Run unit tests on isolation

Phase 2: API Layer (Week 2-3)

Implement tenant routes (CRUD)
Implement KB routes (CRUD)
Create dependency injection for TenantContext
Update document/query routes with tenant filtering
Test with API examples from ADR 004

Phase 3: RAG Integration (Week 3)

Implement TenantRAGManager (instance caching)
Modify LightRAG.query() to accept tenant context
Modify LightRAG.insert() to accept tenant context
Set up monitoring (Prometheus metrics)
Run integration tests

Phase 4: Deployment (Week 4)

Run security audit against ADR 005 checklist
Run load tests with multiple tenants
Prepare migration script for existing workspaces
Deploy to staging (1 week soak test)
Deploy to production (4-phase rollout)
Run incident response drills

adr/
├── 001-multi-tenant-architecture-overview.md      [START HERE - Why]
├── 002-implementation-strategy.md                 [Then read - How & When]
├── 003-data-models-and-storage.md                [Reference - Database design]
├── 004-api-design.md                              [Reference - API specs]
├── 005-security-analysis.md                       [Reference - Security checklist]
├── 006-architecture-diagrams-alternatives.md     [Reference - Visual overview]
├── 007-deployment-guide-quick-reference.md       [Reference - Operations]
└── README.md                                      [This file - Navigation]

🔄 Decision Record Details

Aspect	Details
Decision	Multi-tenant, multi-KB architecture
Status	Proposed (Awaiting approval)
Stakeholders	Engineering, Security, Product, Operations
Effort Estimate	160 developer-hours over 4 weeks
Risk Level	Medium (Well-scoped, tested patterns)
Alternatives	5 considered, 4 rejected with justification
Security Review	Required before Phase 1 start
Rollout Plan	4-phase soft launch (25%→50%→75%→100%)
Success Criteria	13 items in ADR 007
Contingency	2-week delay buffer, rollback to v1.0 if needed

❓ Frequently Asked Questions

Q: Why multi-tenant and not just multi-workspace?

A: Current workspace is implicit and lacks KB-level isolation. Multi-tenant provides explicit isolation, RBAC, audit logging, and SaaS-readiness. See ADR 001 and ADR 006 (alternatives) for detailed comparison.

Q: Will this break existing installations?

A: No. Legacy workspace deployments continue working - they automatically become a tenant with KB named "default". See ADR 003 (Backward Compatibility) for migration details.

Q: What's the performance impact?

A: Approximately 5-10% latency overhead (tenant filtering in queries) offset by instance caching (>90% hit rate). Net impact: negligible for most workloads. See ADR 002 (Performance Targets) for details.

Q: How do we ensure data isolation?

A: Defense in depth:

API Layer: TenantContext dependency validates token and extracts tenant_id
Storage Layer: All queries auto-filtered by WHERE tenant_id = ? AND kb_id = ?
Testing: Automated tests verify cross-tenant access is denied See ADR 005 (Threat Model) for complete security analysis.

Q: Can we support 100+ tenants on one instance?

A: Yes. Architecture supports ~100 concurrent cached instances (configurable). For 100+ tenants, use: instance caching (active tenants), database scaling (PostgreSQL replication), and monitoring. See ADR 002 (Known Limitations) for scaling guidance.

Q: What if a tenant hits the storage quota?

A: System enforces ResourceQuota (configurable per tenant). Exceeding quota returns 429 (Too Many Requests). Tenant admin receives alerts. See ADR 003 (ResourceQuota Model) and ADR 004 (Error Handling).

Q: Can we migrate from workspace without downtime?

A: Yes, with dual-write period:

Deploy v1.5 (supports both models)
Activate background migration job
Verify all data migrated
Remove workspace support Total downtime: 0 minutes. See ADR 007 (Migration Strategy).

📞 Getting Help

Questions about Architecture?
→ Review ADR 001, 006 or ask technical lead

Need Implementation Details?
→ See ADR 002 (phased approach) or ADR 003/004 (specs)

Security Concerns?
→ Review ADR 005 (threat model) or contact security team

Deployment/Operations?
→ See ADR 007 (deployment guide, troubleshooting)

Want to See Alternatives?
→ Review ADR 006 (5 alternatives with pros/cons)

Document Set Version: 1.0
Last Updated: 2025-11-20
Total Pages: ~4,000 lines across 7 documents
Status: ✅ Ready for Review and Implementation
Next Step: Schedule ADR review meeting with stakeholders