docs: Enterprise Edition & Multi-tenancy attribution (#5 )

* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.

2025-12-04 18:09:15 +08:00

26 KiB

Raw Blame History

ADR 006: Architecture Diagrams and Alternatives Analysis

Status: Proposed

Proposed Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                         LightRAG Multi-Tenant System                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │                      FastAPI Application                         │      │
│  ├──────────────────────────────────────────────────────────────────┤      │
│  │                                                                   │      │
│  │  ┌─────────────────────────────────────────────────────────┐    │      │
│  │  │         Request Middleware Layer                        │    │      │
│  │  ├─────────────────────────────────────────────────────────┤    │      │
│  │  │ • CORS Middleware                                      │    │      │
│  │  │ • HTTPS Redirect                                       │    │      │
│  │  │ • Rate Limiting (per tenant)                           │    │      │
│  │  │ • Request Logging & Audit                              │    │      │
│  │  │ • Idempotency Key Handling                             │    │      │
│  │  └─────────────────────────────────────────────────────────┘    │      │
│  │                          ↓                                        │      │
│  │  ┌─────────────────────────────────────────────────────────┐    │      │
│  │  │      Authentication & Tenant Context Extraction        │    │      │
│  │  ├─────────────────────────────────────────────────────────┤    │      │
│  │  │ 1. Parse JWT token or API key from headers             │    │      │
│  │  │ 2. Validate signature and expiration                   │    │      │
│  │  │ 3. Extract tenant_id, kb_id, user_id, permissions      │    │      │
│  │  │ 4. Verify token.tenant_id == path.tenant_id            │    │      │
│  │  │ 5. Verify user can access kb_id                        │    │      │
│  │  │ → Returns TenantContext object                          │    │      │
│  │  └─────────────────────────────────────────────────────────┘    │      │
│  │                          ↓                                        │      │
│  │  ┌─────────────────────────────────────────────────────────┐    │      │
│  │  │         API Routing Layer                               │    │      │
│  │  ├─────────────────────────────────────────────────────────┤    │      │
│  │  │ /api/v1/tenants/{tenant_id}/                           │    │      │
│  │  │ ├─ knowledge-bases/{kb_id}/documents/*                │    │      │
│  │  │ ├─ knowledge-bases/{kb_id}/query*                     │    │      │
│  │  │ ├─ knowledge-bases/{kb_id}/graph/*                    │    │      │
│  │  │ ├─ knowledge-bases/*                                  │    │      │
│  │  │ └─ api-keys/*                                         │    │      │
│  │  └─────────────────────────────────────────────────────────┘    │      │
│  │                          ↓                                        │      │
│  │  ┌─────────────────────────────────────────────────────────┐    │      │
│  │  │    Request Handlers (with TenantContext injected)       │    │      │
│  │  ├─────────────────────────────────────────────────────────┤    │      │
│  │  │ • Validate permissions on TenantContext                │    │      │
│  │  │ • Get tenant-specific RAG instance                     │    │      │
│  │  │ • Pass context to business logic                       │    │      │
│  │  │ • Return response with audit trail                     │    │      │
│  │  └─────────────────────────────────────────────────────────┘    │      │
│  │                                                                   │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                               │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │              Tenant-Aware LightRAG Instance Manager              │      │
│  ├──────────────────────────────────────────────────────────────────┤      │
│  │                                                                   │      │
│  │  Instance Cache:                                                 │      │
│  │  ┌─────────────────────────────────────────────────────────┐    │      │
│  │  │ (tenant_1, kb_1) → LightRAG@memory                     │    │      │
│  │  │ (tenant_1, kb_2) → LightRAG@memory                     │    │      │
│  │  │ (tenant_2, kb_1) → LightRAG@memory                     │    │      │
│  │  │ (tenant_3, kb_1) → LightRAG@memory                     │    │      │
│  │  │ ...                                                     │    │      │
│  │  │ Max: 100 instances (configurable)                      │    │      │
│  │  └─────────────────────────────────────────────────────────┘    │      │
│  │                                                                   │      │
│  │  Each LightRAG instance:                                         │      │
│  │  • Uses tenant-specific configuration (LLM, embedding models)   │      │
│  │  • Works with dedicated namespace: {tenant_id}_{kb_id}          │      │
│  │  • Isolated storage connections                                 │      │
│  │  └─────────────────────────────────────────────────────────────┘    │      │
│  │                                                                   │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                               │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │              Storage Access Layer (with Tenant Filtering)        │      │
│  ├──────────────────────────────────────────────────────────────────┤      │
│  │                                                                   │      │
│  │  Query Modification:                                             │      │
│  │  Before:  SELECT * FROM documents WHERE doc_id = 'abc'          │      │
│  │  After:   SELECT * FROM documents                               │      │
│  │           WHERE tenant_id = 'acme' AND kb_id = 'docs'           │      │
│  │           AND doc_id = 'abc'                                    │      │
│  │                                                                   │      │
│  │  • All queries automatically scoped to current tenant/KB         │      │
│  │  • Prevents accidental cross-tenant data access                 │      │
│  │  • Storage layer enforces isolation (defense in depth)          │      │
│  │                                                                   │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                               │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │                    Storage Backends (Shared)                     │      │
│  ├──────────────────────────────────────────────────────────────────┤      │
│  │                                                                   │      │
│  │  ┌─────────────────┐  ┌─────────────┐  ┌────────────────────┐  │      │
│  │  │   PostgreSQL    │  │   Neo4j     │  │  Milvus/Qdrant    │  │      │
│  │  │  (Shared DB)    │  │  (Shared)   │  │   (Vector Store)   │  │      │
│  │  ├─────────────────┤  ├─────────────┤  ├────────────────────┤  │      │
│  │  │ • Documents     │  │ • Entities  │  │ • Embeddings       │  │      │
│  │  │ • Chunks        │  │ • Relations │  │ • Entity vectors   │  │      │
│  │  │ • Entities      │  │             │  │                    │  │      │
│  │  │ • API Keys      │  │ Each node   │  │ Each vector        │  │      │
│  │  │ • Tenants       │  │ tagged with │  │ tagged with        │  │      │
│  │  │ • KBs           │  │ tenant_id + │  │ tenant_id + kb_id  │  │      │
│  │  │                 │  │ kb_id       │  │                    │  │      │
│  │  │ Filtered by:    │  │             │  │ Filtered by:       │  │      │
│  │  │ tenant_id,      │  │ Filtered by:│  │ tenant_id,         │  │      │
│  │  │ kb_id in WHERE  │  │ tenant_id + │  │ kb_id in query     │  │      │
│  │  │                 │  │ kb_id       │  │                    │  │      │
│  │  └─────────────────┘  └─────────────┘  └────────────────────┘  │      │
│  │                                                                   │      │
│  │  All with tenant/KB isolation at schema/data level              │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow Diagrams

Query Execution Flow

1. Client Request
   ├─ POST /api/v1/tenants/acme/knowledge-bases/docs/query
   ├─ Body: {"query": "What is..."}
   └─ Header: Authorization: Bearer <token>
          │
          ▼
2. Middleware Validation
   ├─ Extract tenant_id, kb_id from URL path
   ├─ Extract token from Authorization header
   ├─ Validate token signature and expiration
   ├─ Extract user_id, tenant_id_in_token, permissions
   └─ VERIFY: tenant_id (path) == tenant_id_in_token
          │
          ▼
3. Dependency Injection
   ├─ Create TenantContext(
   │   tenant_id="acme",
   │   kb_id="docs",
   │   user_id="john",
   │   role="editor",
   │   permissions={"query:run": true}
   └─ )
          │
          ▼
4. Handler Authorization
   ├─ Check TenantContext.permissions["query:run"] == true
   └─ If false → 403 Forbidden
          │
          ▼
5. Get RAG Instance
   ├─ RAGManager.get_instance(tenant_id="acme", kb_id="docs")
   ├─ Check cache → Found → Use cached instance
   └─ (If not cached: create new with tenant config)
          │
          ▼
6. Execute Query
   ├─ RAG.aquery(query="What is...", tenant_context=context)
   │  └─ All internal queries will include tenant/kb filters:
   │     └─ Storage layer automatically adds:
   │        WHERE tenant_id='acme' AND kb_id='docs'
          │
          ▼
7. Storage Layer Filtering
   ├─ Vector search: Find embeddings WHERE tenant_id='acme' AND kb_id='docs'
   ├─ Graph query: Match entities {tenant_id:'acme', kb_id:'docs'}
   ├─ KV lookup: Get items with key prefix 'acme:docs:'
   └─ Returns only acme/docs data (NO cross-tenant leakage possible)
          │
          ▼
8. Response Generation
   ├─ RAG generates response from filtered data
   ├─ Response object created
   └─ Handler receives response with TenantContext
          │
          ▼
9. Audit Logging
   ├─ Log: {
   │   user_id: "john",
   │   tenant_id: "acme",
   │   kb_id: "docs",
   │   action: "query_executed",
   │   status: "success",
   │   timestamp: <now>
   └─ }
          │
          ▼
10. Response Returned to Client
    └─ HTTP 200 with query result

Document Upload Flow

1. Client uploads document
   ├─ POST /api/v1/tenants/acme/knowledge-bases/docs/documents/add
   ├─ File: document.pdf
   └─ Header: Authorization: Bearer <token>
          │
          ▼
2. Authentication & Authorization
   ├─ Validate token, extract TenantContext
   ├─ Check permission: document:create
   └─ Verify tenant_id matches path and token
          │
          ▼
3. File Validation
   ├─ Check file type (PDF, DOCX, etc.)
   ├─ Check file size < quota
   ├─ Sanitize filename
   └─ Generate unique doc_id
          │
          ▼
4. Queue Document Processing
   ├─ Store temp file: /{working_dir}/{tenant_id}/{kb_id}/__tmp__/{doc_id}
   ├─ Create DocStatus record with status="processing"
   ├─ Return to client: {status: "processing", track_id: "..."}
   └─ Start async processing task
          │
          ▼
5. Async Document Processing (background task)
   ├─ Get RAG instance for (acme, docs)
   ├─ Insert document:
   │  └─ RAG.ainsert(file_path, tenant_id="acme", kb_id="docs")
   │     └─ Internal processing automatically tags data with:
   │        └─ tenant_id="acme", kb_id="docs"
   │
   ├─ Update DocStatus:
   │  ├─ status → "success"
   │  ├─ chunks_processed → 42
   │  └─ entities_extracted → 15
   │
   └─ Move file: __tmp__ → {kb_id}/documents/
          │
          ▼
6. Storage Writes (tenant-scoped)
   ├─ PostgreSQL:
   │  └─ INSERT INTO chunks (tenant_id, kb_id, doc_id, content)
   │     VALUES ('acme', 'docs', 'doc-123', '...')
   │
   ├─ Neo4j:
   │  └─ CREATE (e:Entity {tenant_id:'acme', kb_id:'docs', name:'...'})-[:IN_KB]->(kb)
   │
   └─ Milvus:
      └─ Insert vector with metadata: {tenant_id:'acme', kb_id:'docs'}
          │
          ▼
7. Client Polls for Status
   ├─ GET /api/v1/tenants/acme/knowledge-bases/docs/documents/{doc_id}/status
   ├─ Returns: {status: "success", chunks: 42, entities: 15}
   └─ Client confirms upload complete

Alternatives Considered

Alternative 1: Separate Database Per Tenant

Architecture:

Each tenant gets dedicated PostgreSQL database
Separate Neo4j instances per tenant
Separate Milvus collections per tenant

Tenant A Server → PostgreSQL A
                → Neo4j A
                → Milvus A

Tenant B Server → PostgreSQL B
                → Neo4j B
                → Milvus B

Pros:

Maximum isolation (physical separation)
Easier compliance (HIPAA, GDPR)
Better disaster recovery per tenant
Easier scaling (scale out per tenant)

Cons:

❌ Massive operational overhead
- Each database needs separate backup, upgrade, monitoring
- 100 tenants = 100 databases to manage
- Database licensing costs multiply (100x more expensive)
❌ Complex deployment & maintenance
- Infrastructure-as-Code becomes complex
- Database credentials management nightmare
- Harder debugging with distributed databases
❌ Impossible resource sharing
- Cannot leverage shared compute resources
- Cannot optimize resource usage globally
- Waste of resources (each DB has minimum overhead)
❌ Cross-tenant features impossible
- Data sharing between tenants difficult
- Consolidated reporting/analytics hard to implement

Decision: REJECTED Too expensive and operationally complex for moderate scale.

Alternative 2: Dedicated Server Per Tenant

Architecture:

Each tenant runs own LightRAG instance
Own Python process, own configurations
Own memory/CPU allocation

Tenant A    → LightRAG Process A (port 9621)
Tenant B    → LightRAG Process B (port 9622)
Tenant C    → LightRAG Process C (port 9623)

Pros:

Complete isolation (separate processes)
Easy to manage per-tenant configs
Can use different models per tenant

Cons:

❌ Massive resource waste
- Minimum ~500MB RAM per instance × 100 tenants = 50GB+ RAM
- Minimum CPU overhead per process
❌ Extremely expensive at scale
- 100 tenants × 4GB allocated = 400GB RAM needed
- Infrastructure costs prohibitive
❌ Operational nightmare
- 100 processes to monitor
- 100 upgrades/patches to manage
- Complex deployment orchestration
❌ Poor utilization
- Most tenants underutilize their resources
- Cannot rebalance resources dynamically
- Peak loads unpredictable per tenant

Decision: REJECTED Not economically viable for enterprise deployments.

Alternative 3: Simple Workspace Rename (No Knowledge Base)

Architecture:

Rename "workspace" to "tenant"
No KB concept
Assume 1 KB per tenant

POST /api/v1/workspaces/{workspace_id}/query
→ becomes
POST /api/v1/tenants/{tenant_id}/query

Pros:

Minimal code changes
Backward compatible
Quick implementation (1 week)

Cons:

❌ No knowledge base isolation
- Tenant with multiple unrelated KBs must share config
- Cannot have tenant-specific KB settings
- All data mixed together
❌ Cannot enforce cross-tenant access prevention
- Workspace is just a directory/field
- No API-level enforcement
- Easy to make mistakes
❌ No RBAC
- Cannot grant access to specific KBs
- All-or-nothing tenant access
- No fine-grained permissions
❌ No tenant-specific configuration
- All tenants must use same LLM/embedding models
- Cannot customize per tenant needs
❌ Limited compliance features
- No audit trails of who accessed what
- Difficult to enforce data residency
- No resource quotas

Decision: REJECTED Doesn't meet business requirements for true multi-tenancy.

Alternative 4: Shared Single LightRAG for All Tenants

Architecture:

One LightRAG instance for all tenants
Single namespace, single graph
Tenant filtering only at API layer

API Layer → Filters query by tenant → Single LightRAG Instance

Pros:

Minimal resource usage
Single deployment
Simple to maintain

Cons:

❌ Data isolation risk is CRITICAL
- Single point of failure for all tenants
- One query mistake → cross-tenant data leak
- Cannot be patched without affecting all
❌ Performance bottleneck
- Single instance cannot scale with tenants
- All LLM calls compete for resources
- All embedding calls serialized
❌ Tenant-specific configuration impossible
- All tenants share same models
- Cannot customize chunk size, top_k, etc per tenant
❌ No blast radius isolation
- One tenant's bad data can corrupt all
- One tenant's quota exhaustion affects all
❌ Compliance impossible
- Data residency requirements: cannot guarantee where data is
- GDPR right to deletion: must delete entire system
- Audit requirements: cannot track per-tenant operations

Decision: REJECTED Unacceptable security and operational risks.

Alternative 5: Sharding by Tenant Hash

Architecture:

Hash tenant ID
Route to specific shard server
Multiple instances with different tenant ranges

Tenant Hash % 3
├─ Shard 0: LightRAG A (tenants 0, 3, 6, 9...)
├─ Shard 1: LightRAG B (tenants 1, 4, 7, 10...)
└─ Shard 2: LightRAG C (tenants 2, 5, 8, 11...)

Pros:

Distributes load across instances
Better than single instance
Can grow to 3+ instances

Cons:

❌ Breaks operational simplicity
- Need load balancer + routing logic
- Shards must be preconfigured
- Adding tenant requires determining shard
❌ Rebalancing is complex
- Adding new shard requires data migration
- Tenant addition might change shard assignment
- Hotspots impossible to fix dynamically
❌ Doesn't reduce fundamental costs
- Still need multiple instances
- Each instance has full overhead
- Only slightly better than per-tenant instances
❌ More complex than multi-tenant single instance
- Routing logic adds latency
- Debugging harder (data could be on any shard)
- Cross-shard features harder to implement

Decision: REJECTED Introduces complexity without enough benefit over single instance per tenant approach.

Comparison Table

Approach	Isolation	Cost	Complexity	Scalability	Selected
Proposed: Single Instance Multi-Tenant	✓ Good	✓ Low	✓ Medium	✓ Excellent	✓ YES
Alt 1: DB Per Tenant	✓✓ Perfect	✗✗ 100x	✗✗ Very High	✗ Limited	✗
Alt 2: Server Per Tenant	✓ Good	✗✗ 50x	✗ High	✗ Limited	✗
Alt 3: Workspace Rename	~ Weak	✓ Very Low	✓ Very Low	✓ Good	✗
Alt 4: Single Instance	✗ Poor	✓ Very Low	✓ Very Low	✗ Poor	✗
Alt 5: Sharding	✓ Good	✗ 10-20x	✗✗ High	✓ Good	✗

Why This Approach Wins

The proposed single instance, multi-tenant, multi-KB architecture offers the optimal balance:

Security: Complete tenant isolation through multiple layers
Cost: Efficient resource sharing (100 tenants ≈ 1.1x cost of single tenant)
Complexity: Manageable (dependency injection handles most complexity)
Scalability: Single instance can serve 100s of tenants, scales vertically well
Compliance: Audit trails and data isolation support compliance needs
Features: Supports RBAC, per-tenant config, resource quotas

Document Version: 1.0
Last Updated: 2025-11-20
Related Files: 001-multi-tenant-architecture-overview.md

26 KiB Raw Blame History Unescape Escape

ADR 006: Architecture Diagrams and Alternatives Analysis

Status: Proposed

Proposed Architecture Diagram

Data Flow Diagrams

Query Execution Flow

Document Upload Flow

Alternatives Considered

Alternative 1: Separate Database Per Tenant

Alternative 2: Dedicated Server Per Tenant

Alternative 3: Simple Workspace Rename (No Knowledge Base)

Alternative 4: Shared Single LightRAG for All Tenants

Alternative 5: Sharding by Tenant Hash

Comparison Table

Why This Approach Wins

26 KiB

Raw Blame History