* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad. * Add multi-tenant testing strategy and ADR index documentation - Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details. - Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles. * feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise - Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints. - Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options. - Documented the LightRAG architecture, storage backends, LLM integrations, and query modes. - Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
973 lines
36 KiB
Markdown
973 lines
36 KiB
Markdown
# Multi-Tenant Architecture
|
|
|
|
> A comprehensive guide to understanding, activating, and implementing multi-tenant support across all storage backends
|
|
|
|
**Last Updated**: November 20, 2025
|
|
**Status**: Production Ready
|
|
**Audience**: Developers, DevOps Engineers, System Architects
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Architecture Model](#architecture-model)
|
|
3. [Multi-Tenant Concept](#multi-tenant-concept)
|
|
4. [Supported Backends](#supported-backends)
|
|
5. [How It Works](#how-it-works)
|
|
6. [Getting Started](#getting-started)
|
|
7. [Implementation Examples](#implementation-examples)
|
|
8. [Security & Isolation](#security--isolation)
|
|
9. [Migration Guide](#migration-guide)
|
|
10. [Troubleshooting](#troubleshooting)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
LightRAG now supports complete **multi-tenant architecture** across all 10 storage backends, enabling secure isolation of data for multiple organizations, teams, or customers within a single LightRAG deployment.
|
|
|
|
### Key Benefits
|
|
|
|
- Complete Data Isolation: Database-level filtering prevents cross-tenant access
|
|
- Easy Activation: Simple configuration with backward compatibility
|
|
- All Backends Supported: Works with PostgreSQL, MongoDB, Redis, Neo4j, and vector/graph databases
|
|
- Zero Breaking Changes: Existing code continues to work with defaults
|
|
- Scale Efficiently: Run one instance for multiple tenants
|
|
|
|
### Real-World Scenario
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Single LightRAG Deployment │
|
|
│ │
|
|
│ ┌──────────────────────┐ ┌──────────────────────┐ │
|
|
│ │ Tenant: Acme Corp │ │ Tenant: TechStart │ │
|
|
│ │ │ │ │ │
|
|
│ │ KB: kb-prod ─────┼─────>│ KB: kb-main ────┐ │ │
|
|
│ │ KB: kb-dev │ │ KB: kb-staging │ │ │
|
|
│ │ │ │ │ │ │
|
|
│ └──────────────────────┘ └──────────────────────┘ │
|
|
│ │ │ │
|
|
│ └─────────────┬───────────────┘ │
|
|
│ │ │
|
|
│ All data isolated at database level │
|
|
│ │ │
|
|
│ ┌──────────────┴──────────────┐ │
|
|
│ ▼ ▼ │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ PostgreSQL │ │ MongoDB │ │
|
|
│ │ (tenant_id+kb) │ │ (tenant_id+kb) │ │
|
|
│ └─────────────────┘ └─────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Architecture Model
|
|
|
|
### Hierarchical Structure
|
|
|
|
```mermaid
|
|
graph TD
|
|
A["Deployment"] --> B["Tenant: Acme Corp"]
|
|
A --> C["Tenant: TechStart"]
|
|
B --> D["KB: kb-prod"]
|
|
B --> E["KB: kb-dev"]
|
|
C --> F["KB: kb-main"]
|
|
C --> G["KB: kb-staging"]
|
|
D --> H["Documents"]
|
|
D --> I["Entities & Relations"]
|
|
D --> J["Vectors"]
|
|
E --> K["Documents"]
|
|
E --> L["Entities & Relations"]
|
|
F --> M["Documents"]
|
|
G --> N["Entities & Relations"]
|
|
|
|
style A fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20
|
|
style B fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
|
|
style C fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
|
|
style D fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
|
|
style E fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
|
|
style F fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
|
|
style G fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
|
|
```
|
|
|
|
### Data Model - Composite Key Pattern
|
|
|
|
Each resource is identified by a **composite key**: `(tenant_id, kb_id, resource_id)`
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────────────┐
|
|
│ Composite Key Pattern │
|
|
├────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ tenant_id │ kb_id │ resource_id │ data │
|
|
│ ───────── │ ───────── │ ───────── │ ──── │
|
|
│ "acme" │ "kb-prod" │ "doc-123" │ {...} │
|
|
│ "acme" │ "kb-dev" │ "doc-456" │ {...} │
|
|
│ "techst" │ "kb-main" │ "doc-789" │ {...} │
|
|
│ │
|
|
│ Same resource_id in different tenant/kb = different data │
|
|
│ Prevents accidental cross-tenant access │
|
|
│ │
|
|
└────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Multi-Tenant Concept
|
|
|
|
### Three-Level Isolation
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Multi-Tenant Isolation Levels │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Level 1: TENANT │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ Organization/Customer/Account (highest level) │ │
|
|
│ │ Example: "acme-corp", "techstart-inc" │ │
|
|
│ │ Isolation: Complete separation between tenants │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Level 2: KNOWLEDGE BASE (KB) │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ Project/Environment/Domain within tenant │ │
|
|
│ │ Examples: │ │
|
|
│ │ - Acme Corp: kb-prod, kb-dev, kb-staging │ │
|
|
│ │ - TechStart: kb-main, kb-backup │ │
|
|
│ │ Isolation: Separate data per KB within same tenant │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Level 3: RESOURCES │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ Documents, Entities, Vectors, Relations (lowest level) │ │
|
|
│ │ Automatically filtered by tenant + kb context │ │
|
|
│ │ Isolation: Only accessible via tenant/kb scope │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Data Access Pattern
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client as Client Application
|
|
participant API as LightRAG API
|
|
participant TenantCtx as Tenant Context
|
|
participant Storage as Storage Backend
|
|
|
|
Client->>API: GET /documents<br/>(tenant: acme-corp, kb: kb-prod)
|
|
API->>TenantCtx: Validate & extract<br/>tenant_id, kb_id
|
|
TenantCtx->>Storage: Query WHERE<br/>tenant_id='acme-corp'<br/>AND kb_id='kb-prod'
|
|
Storage-->>API: Return filtered results
|
|
API-->>Client: Documents (acme-corp only)
|
|
|
|
Note over TenantCtx: Even if request<br/>contains tenant_id in URL,<br/>storage layer<br/>enforces isolation
|
|
```
|
|
|
|
---
|
|
|
|
## Supported Backends
|
|
|
|
### Complete Backend Coverage
|
|
|
|
| Backend | Isolation Method | Status | Module |
|
|
|---------|---|---|---|
|
|
| **PostgreSQL** | Column filtering + composite keys | Complete | `postgres_tenant_support.py` |
|
|
| **MongoDB** | Document field filtering | Complete | `mongo_tenant_support.py` |
|
|
| **Redis** | Key prefixing (tenant:kb:key) | Complete | `redis_tenant_support.py` |
|
|
| **Neo4j** | Cypher + node relationships | Complete | `graph_tenant_support.py` |
|
|
| **Memgraph** | openCypher + properties | Complete | `graph_tenant_support.py` |
|
|
| **NetworkX** | Subgraph extraction | Complete | `graph_tenant_support.py` |
|
|
| **Qdrant** | Metadata filtering | Complete | `vector_tenant_support.py` |
|
|
| **Milvus** | WHERE expression filtering | Complete | `vector_tenant_support.py` |
|
|
| **FAISS** | Index naming + metadata | Complete | `vector_tenant_support.py` |
|
|
| **Nano Vector DB** | Document metadata | Complete | `vector_tenant_support.py` |
|
|
|
|
### Backend Architecture Diagram
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Storage Backends"
|
|
Relational["Relational"]
|
|
Document["Document"]
|
|
KV["Key-Value"]
|
|
Vector["Vector"]
|
|
Graph["Graph"]
|
|
|
|
PG["PostgreSQL"]
|
|
Mongo["MongoDB"]
|
|
Redis["Redis"]
|
|
Qdrant["Qdrant"]
|
|
Milvus["Milvus"]
|
|
FAISS["FAISS"]
|
|
Nano["Nano VDB"]
|
|
Neo4j["Neo4j"]
|
|
Memgraph["Memgraph"]
|
|
NetworkX["NetworkX"]
|
|
end
|
|
|
|
subgraph "Support Modules"
|
|
PGSupport["postgres_tenant<br/>_support.py"]
|
|
MongoSupport["mongo_tenant<br/>_support.py"]
|
|
RedisSupport["redis_tenant<br/>_support.py"]
|
|
VectorSupport["vector_tenant<br/>_support.py"]
|
|
GraphSupport["graph_tenant<br/>_support.py"]
|
|
end
|
|
|
|
Relational --> PG
|
|
Document --> Mongo
|
|
KV --> Redis
|
|
Vector --> Qdrant
|
|
Vector --> Milvus
|
|
Vector --> FAISS
|
|
Vector --> Nano
|
|
Graph --> Neo4j
|
|
Graph --> Memgraph
|
|
Graph --> NetworkX
|
|
|
|
PG -.-> PGSupport
|
|
Mongo -.-> MongoSupport
|
|
Redis -.-> RedisSupport
|
|
Qdrant -.-> VectorSupport
|
|
Milvus -.-> VectorSupport
|
|
FAISS -.-> VectorSupport
|
|
Nano -.-> VectorSupport
|
|
Neo4j -.-> GraphSupport
|
|
Memgraph -.-> GraphSupport
|
|
NetworkX -.-> GraphSupport
|
|
|
|
style Relational fill:#F1F8E9,stroke:#558B2F,stroke-width:2px,color:#33691E
|
|
style Document fill:#ECE7F3,stroke:#7B1FA2,stroke-width:2px,color:#4A148C
|
|
style KV fill:#E0F2F1,stroke:#00897B,stroke-width:2px,color:#004D40
|
|
style Vector fill:#FFF3E0,stroke:#E65100,stroke-width:2px,color:#BF360C
|
|
style Graph fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
|
|
style PGSupport fill:#C8E6C9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20
|
|
style MongoSupport fill:#D8C5E5,stroke:#512DA8,stroke-width:2px,color:#311B92
|
|
style RedisSupport fill:#B2DFDB,stroke:#00695C,stroke-width:2px,color:#004D40
|
|
style VectorSupport fill:#FFD8A8,stroke:#D84315,stroke-width:2px,color:#BF360C
|
|
style GraphSupport fill:#E1BEE7,stroke:#7B1FA2,stroke-width:2px,color:#4A148C
|
|
```
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
### Query Execution Flow
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Typical Query Execution Flow │
|
|
├──────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ 1. Client Request │
|
|
│ GET /api/documents │
|
|
│ Headers: {tenant: "acme-corp", kb: "kb-prod"} │
|
|
│ │
|
|
│ 2. Extract Tenant Context │
|
|
│ tenant_id = extract_from_request(request) │
|
|
│ kb_id = extract_from_request(request) │
|
|
│ │
|
|
│ 3. Build Tenant-Aware Query │
|
|
│ Base Query: │
|
|
│ SELECT * FROM documents WHERE status='active' │
|
|
│ │
|
|
│ Add Tenant Filter: │
|
|
│ SELECT * FROM documents │
|
|
│ WHERE status='active' │
|
|
│ AND tenant_id='acme-corp' │
|
|
│ AND kb_id='kb-prod' │
|
|
│ │
|
|
│ 4. Execute Query │
|
|
│ Storage backend executes filtered query │
|
|
│ │
|
|
│ 5. Return Results │
|
|
│ Only documents from acme-corp/kb-prod returned │
|
|
│ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Storage Layer Filtering
|
|
|
|
Each backend has its own filtering mechanism:
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Backend-Specific Filtering Methods │
|
|
├──────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ PostgreSQL: │
|
|
│ WHERE clause + composite PRIMARY KEY │
|
|
│ (tenant_id, kb_id, id) │
|
|
│ │
|
|
│ MongoDB: │
|
|
│ Document filter │
|
|
│ {tenant_id: "acme-corp", kb_id: "kb-prod"} │
|
|
│ │
|
|
│ Redis: │
|
|
│ Key prefix pattern │
|
|
│ acme-corp:kb-prod:original_key │
|
|
│ │
|
|
│ Qdrant (Vector DB): │
|
|
│ Metadata filter │
|
|
│ {"must": [{"key": "tenant_id", ...}, ...]} │
|
|
│ │
|
|
│ Neo4j (Graph DB): │
|
|
│ Cypher property matching │
|
|
│ WHERE node.tenant_id = 'acme-corp' │
|
|
│ AND node.kb_id = 'kb-prod' │
|
|
│ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Started
|
|
|
|
### Quick Activation
|
|
|
|
Multi-tenant support is **built-in** and **automatically enabled**. Here's how to use it:
|
|
|
|
#### Step 1: Import Support Modules
|
|
|
|
```python
|
|
# For PostgreSQL
|
|
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
|
|
|
|
# For MongoDB
|
|
from lightrag.kg.mongo_tenant_support import MongoTenantHelper
|
|
|
|
# For Redis
|
|
from lightrag.kg.redis_tenant_support import RedisTenantNamespace
|
|
|
|
# For Vector DBs (Qdrant, Milvus, FAISS, Nano)
|
|
from lightrag.kg.vector_tenant_support import QdrantTenantHelper
|
|
|
|
# For Graph DBs (Neo4j, Memgraph, NetworkX)
|
|
from lightrag.kg.graph_tenant_support import Neo4jTenantHelper
|
|
```
|
|
|
|
#### Step 2: Use Tenant Context
|
|
|
|
```python
|
|
# Set tenant context for your operation
|
|
tenant_id = "acme-corp"
|
|
kb_id = "kb-prod"
|
|
|
|
# All subsequent queries will be automatically scoped to this tenant/kb
|
|
# No additional filtering needed in application code!
|
|
```
|
|
|
|
#### Step 3: That's It!
|
|
|
|
All database operations are automatically isolated. No breaking changes to existing code.
|
|
|
|
### Configuration
|
|
|
|
Minimal configuration needed. If using environment variables:
|
|
|
|
```bash
|
|
# Optional: Set default tenant for single-tenant scenarios
|
|
export LIGHTRAG_DEFAULT_TENANT="default"
|
|
export LIGHTRAG_DEFAULT_KB="default"
|
|
|
|
# Or use at runtime
|
|
context = TenantContext(tenant_id="acme-corp", kb_id="kb-prod")
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Examples
|
|
|
|
### PostgreSQL Example
|
|
|
|
```python
|
|
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
|
|
|
|
# Build a tenant-aware query
|
|
sql = "SELECT * FROM LIGHTRAG_DOC_FULL WHERE status = :status"
|
|
params = {"status": "active"}
|
|
|
|
# Add tenant filtering
|
|
filtered_sql, filtered_params = TenantSQLBuilder.build_filtered_query(
|
|
base_query=sql,
|
|
tenant_id="acme-corp",
|
|
kb_id="kb-prod",
|
|
additional_params=[params]
|
|
)
|
|
|
|
# Execute
|
|
result = await db.query(filtered_sql, filtered_params)
|
|
# Result: Only documents from acme-corp/kb-prod with status=active
|
|
```
|
|
|
|
### MongoDB Example
|
|
|
|
```python
|
|
from lightrag.kg.mongo_tenant_support import MongoTenantHelper
|
|
|
|
# Build tenant-aware filter
|
|
tenant_filter = MongoTenantHelper.get_tenant_filter(
|
|
tenant_id="acme-corp",
|
|
kb_id="kb-prod",
|
|
additional_filter={"status": "active"}
|
|
)
|
|
|
|
# Use in query
|
|
document = await collection.find_one(tenant_filter)
|
|
# Result: Only returns documents from acme-corp/kb-prod
|
|
```
|
|
|
|
### Redis Example
|
|
|
|
```python
|
|
from lightrag.kg.redis_tenant_support import RedisTenantNamespace
|
|
|
|
# Create a tenant-scoped namespace
|
|
ns = RedisTenantNamespace(
|
|
redis_client=redis,
|
|
tenant_id="acme-corp",
|
|
kb_id="kb-prod"
|
|
)
|
|
|
|
# All operations are automatically tenant-scoped
|
|
value = await ns.get("user:123")
|
|
await ns.set("user:123", json_data)
|
|
await ns.delete("user:123")
|
|
|
|
# Key stored as: "acme-corp:kb-prod:user:123"
|
|
# No tenant/kb prefix needed in application code
|
|
```
|
|
|
|
### Vector DB (Qdrant) Example
|
|
|
|
```python
|
|
from lightrag.kg.vector_tenant_support import QdrantTenantHelper
|
|
|
|
# Build tenant filter
|
|
tenant_filter = QdrantTenantHelper.build_qdrant_filter(
|
|
tenant_id="acme-corp",
|
|
kb_id="kb-prod"
|
|
)
|
|
|
|
# Search with automatic tenant isolation
|
|
results = await qdrant.search(
|
|
collection_name="embeddings",
|
|
query_vector=query_embedding,
|
|
query_filter=tenant_filter, # Automatic isolation
|
|
limit=10
|
|
)
|
|
# Result: Only vectors from acme-corp/kb-prod
|
|
```
|
|
|
|
### Graph DB (Neo4j) Example
|
|
|
|
```python
|
|
from lightrag.kg.graph_tenant_support import Neo4jTenantHelper
|
|
|
|
helper = Neo4jTenantHelper()
|
|
|
|
# Build tenant-aware Cypher query
|
|
base_query = "MATCH (n:Entity) RETURN n"
|
|
query, params = helper.build_tenant_aware_query(
|
|
base_query=base_query,
|
|
tenant_id="acme-corp",
|
|
kb_id="kb-prod",
|
|
node_var="n"
|
|
)
|
|
|
|
# Execute
|
|
result = await session.run(query, params)
|
|
# Result: Only entities from acme-corp/kb-prod
|
|
```
|
|
|
|
### Complete Application Example
|
|
|
|
```python
|
|
from fastapi import FastAPI, Header
|
|
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
|
|
|
|
app = FastAPI()
|
|
|
|
@app.get("/documents")
|
|
async def get_documents(
|
|
tenant_id: str = Header(...),
|
|
kb_id: str = Header(...),
|
|
db = Depends(get_db)
|
|
):
|
|
"""Get documents for a specific tenant/kb"""
|
|
|
|
# Build tenant-scoped query
|
|
query = "SELECT id, title, content FROM documents"
|
|
|
|
filtered_sql, params = TenantSQLBuilder.build_filtered_query(
|
|
base_query=query,
|
|
tenant_id=tenant_id,
|
|
kb_id=kb_id,
|
|
additional_params=[]
|
|
)
|
|
|
|
# Execute (tenant context enforced at storage layer)
|
|
documents = await db.query(filtered_sql, params)
|
|
|
|
return {
|
|
"tenant": tenant_id,
|
|
"kb": kb_id,
|
|
"documents": documents,
|
|
"count": len(documents)
|
|
}
|
|
|
|
|
|
@app.post("/documents/{doc_id}")
|
|
async def add_document(
|
|
doc_id: str,
|
|
tenant_id: str = Header(...),
|
|
kb_id: str = Header(...),
|
|
content: dict,
|
|
db = Depends(get_db)
|
|
):
|
|
"""Add a document for a specific tenant/kb"""
|
|
|
|
# Composite key: (tenant_id, kb_id, doc_id)
|
|
query = """
|
|
INSERT INTO documents (tenant_id, kb_id, id, content)
|
|
VALUES (:tenant_id, :kb_id, :id, :content)
|
|
"""
|
|
|
|
result = await db.execute(query, {
|
|
"tenant_id": tenant_id,
|
|
"kb_id": kb_id,
|
|
"id": doc_id,
|
|
"content": content
|
|
})
|
|
|
|
return {
|
|
"status": "created",
|
|
"tenant": tenant_id,
|
|
"kb": kb_id,
|
|
"doc_id": doc_id
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Security & Isolation
|
|
|
|
### Isolation Guarantees
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Multi-Tenant Isolation Guarantees │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Database-Level Enforcement │
|
|
│ Every query includes (tenant_id, kb_id) filtering │
|
|
│ Impossible to retrieve data from other tenants │
|
|
│ │
|
|
│ Composite Key Constraints │
|
|
│ PRIMARY KEY (tenant_id, kb_id, id) │
|
|
│ Prevents accidental ID collisions between tenants │
|
|
│ │
|
|
│ No Application-Level Trust │
|
|
│ Even if app code has bugs, storage layer enforces │
|
|
│ Tenant isolation is deterministic, not probabilistic │
|
|
│ │
|
|
│ Migration Safety │
|
|
│ Legacy single-tenant data maps to default tenant │
|
|
│ Gradual migration path without data loss │
|
|
│ │
|
|
│ Audit Trail │
|
|
│ All operations include tenant context │
|
|
│ Easy to track which tenant accessed what │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Security Checklist
|
|
|
|
```python
|
|
# DO: Always include tenant context
|
|
@app.get("/documents")
|
|
async def get_docs(tenant_id: str = Header(...), kb_id: str = Header(...)):
|
|
query = TenantSQLBuilder.build_filtered_query(
|
|
query, tenant_id, kb_id
|
|
)
|
|
return await db.query(query)
|
|
|
|
# DON'T: Query without tenant filtering
|
|
@app.get("/documents") # WRONG - no tenant context
|
|
async def get_docs():
|
|
return await db.query("SELECT * FROM documents")
|
|
|
|
# DO: Validate tenant context early
|
|
async def validate_tenant_access(tenant_id, user_tenant):
|
|
if tenant_id != user_tenant:
|
|
raise PermissionError(f"Cannot access {tenant_id}")
|
|
|
|
# DO: Use composite keys consistently
|
|
key = f"{tenant_id}:{kb_id}:{resource_id}"
|
|
|
|
# DON'T: Use resource IDs without tenant prefix
|
|
key = f"doc:{resource_id}" # WRONG - can collide with other tenants
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Guide
|
|
|
|
### Migrating Existing Single-Tenant Data
|
|
|
|
Multi-tenant support includes automatic migration utilities for each backend.
|
|
|
|
#### PostgreSQL Migration
|
|
|
|
```python
|
|
from lightrag.kg.postgres_tenant_support import add_tenant_columns_migration
|
|
|
|
# Run one-time migration
|
|
await add_tenant_columns_migration(
|
|
db=database_connection,
|
|
default_tenant_id="default",
|
|
default_kb_id="default"
|
|
)
|
|
|
|
# What it does:
|
|
# 1. Adds tenant_id and kb_id columns to all tables
|
|
# 2. Sets existing rows to default values
|
|
# 3. Creates composite indexes for performance
|
|
# 4. Updates PRIMARY KEY constraints
|
|
```
|
|
|
|
#### MongoDB Migration
|
|
|
|
```python
|
|
from lightrag.kg.mongo_tenant_support import add_tenant_fields_to_collection
|
|
|
|
# Run migration on each collection
|
|
await add_tenant_fields_to_collection(
|
|
collection=mongodb_collection,
|
|
default_tenant_id="default",
|
|
default_kb_id="default"
|
|
)
|
|
|
|
# Creates indexes:
|
|
# {tenant_id: 1, kb_id: 1, _id: 1}
|
|
```
|
|
|
|
#### Redis Migration (with Dry-Run)
|
|
|
|
```python
|
|
from lightrag.kg.redis_tenant_support import migrate_redis_to_tenant
|
|
|
|
# Test migration first (dry-run)
|
|
stats = await migrate_redis_to_tenant(
|
|
redis_client=redis,
|
|
old_key_pattern="user:*",
|
|
default_tenant_id="default",
|
|
default_kb_id="default",
|
|
dry_run=True # Preview only
|
|
)
|
|
|
|
print(f"Will migrate: {stats['migrated']} keys")
|
|
print(f"Will skip: {stats['skipped']} keys")
|
|
print(f"Failed: {stats['failed']} keys")
|
|
|
|
# Run actual migration
|
|
stats = await migrate_redis_to_tenant(
|
|
redis_client=redis,
|
|
old_key_pattern="user:*",
|
|
default_tenant_id="default",
|
|
default_kb_id="default",
|
|
dry_run=False # Apply changes
|
|
)
|
|
```
|
|
|
|
### Migration Workflow
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ Safe Migration Process │
|
|
├──────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ 1. BACKUP │
|
|
│ - Create database snapshots │
|
|
│ - Export critical data │
|
|
│ │
|
|
│ 2. TEST ENVIRONMENT │
|
|
│ - Restore backup to test DB │
|
|
│ - Run migration with dry-run │
|
|
│ - Verify statistics match expectations │
|
|
│ │
|
|
│ 3. PRODUCTION STAGING │
|
|
│ - Run migration on staging with dry-run │
|
|
│ - Test application with new schema │
|
|
│ - Monitor performance │
|
|
│ │
|
|
│ 4. PRODUCTION EXECUTION │
|
|
│ - Schedule maintenance window │
|
|
│ - Stop application │
|
|
│ - Run actual migration (dry_run=False) │
|
|
│ - Verify data integrity │
|
|
│ - Restart application │
|
|
│ │
|
|
│ 5. VALIDATION │
|
|
│ - Run integration tests │
|
|
│ - Check application logs │
|
|
│ - Verify tenant isolation │
|
|
│ - Monitor for 24 hours │
|
|
│ │
|
|
└──────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues & Solutions
|
|
|
|
#### Issue 1: No tenant context found
|
|
|
|
```python
|
|
# Problem
|
|
async def get_documents(db):
|
|
result = await db.query("SELECT * FROM documents")
|
|
# Error: No tenant context provided
|
|
|
|
# Solution
|
|
async def get_documents(db, tenant_id: str = Header(...)):
|
|
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
|
|
|
|
query = "SELECT * FROM documents"
|
|
filtered_sql, params = TenantSQLBuilder.build_filtered_query(
|
|
query, tenant_id, "kb-prod"
|
|
)
|
|
result = await db.query(filtered_sql, params)
|
|
```
|
|
|
|
#### Issue 2: Cross-tenant data visible
|
|
|
|
```python
|
|
# Problem
|
|
filter_dict = {"status": "active"} # Missing tenant fields!
|
|
result = await collection.find(filter_dict)
|
|
|
|
# Solution
|
|
from lightrag.kg.mongo_tenant_support import MongoTenantHelper
|
|
|
|
filter_dict = MongoTenantHelper.get_tenant_filter(
|
|
"acme-corp", "kb-prod",
|
|
additional_filter={"status": "active"}
|
|
)
|
|
result = await collection.find(filter_dict)
|
|
```
|
|
|
|
#### Issue 3: Performance degradation after migration
|
|
|
|
```python
|
|
# Solution: Ensure indexes exist
|
|
from lightrag.kg.postgres_tenant_support import get_tenant_indexes
|
|
|
|
# Get recommended indexes
|
|
indexes = get_tenant_indexes()
|
|
|
|
# Create in PostgreSQL
|
|
for index_sql in indexes:
|
|
await db.execute(index_sql)
|
|
|
|
# Verify
|
|
ANALYZE documents; -- Update statistics
|
|
EXPLAIN SELECT * FROM documents
|
|
WHERE tenant_id='acme-corp'
|
|
AND kb_id='kb-prod'; -- Check query plan
|
|
```
|
|
|
|
#### Issue 4: Backward compatibility broken
|
|
|
|
```python
|
|
# Solution: Use default tenant values
|
|
context = TenantContext(
|
|
tenant_id="default", # Default for legacy code
|
|
kb_id="default"
|
|
)
|
|
|
|
# Legacy code continues to work
|
|
result = await db.query(legacy_query) # Uses default context
|
|
```
|
|
|
|
### Debugging Multi-Tenant Issues
|
|
|
|
```python
|
|
# Enable debug logging
|
|
import logging
|
|
logging.basicConfig(level=logging.DEBUG)
|
|
|
|
# Add tenant context to logs
|
|
import contextvars
|
|
|
|
tenant_context = contextvars.ContextVar(
|
|
'tenant_context',
|
|
default={'tenant_id': 'unknown', 'kb_id': 'unknown'}
|
|
)
|
|
|
|
# In middleware
|
|
def set_tenant_context(tenant_id, kb_id):
|
|
tenant_context.set({'tenant_id': tenant_id, 'kb_id': kb_id})
|
|
|
|
# In logging
|
|
class TenantFilter(logging.Filter):
|
|
def filter(self, record):
|
|
ctx = tenant_context.get()
|
|
record.tenant = ctx['tenant_id']
|
|
record.kb = ctx['kb_id']
|
|
return True
|
|
|
|
handler = logging.StreamHandler()
|
|
handler.addFilter(TenantFilter())
|
|
logging.getLogger().addHandler(handler)
|
|
|
|
# Logs will show:
|
|
# 2025-11-20 10:30:45 [acme-corp:kb-prod] SELECT from documents
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### DO
|
|
|
|
- Always pass tenant context to every operation
|
|
- Use support module helpers (don't build queries manually)
|
|
- Create composite indexes on (tenant_id, kb_id, ...)
|
|
- Validate tenant context early in request pipeline
|
|
- Log all tenant-related operations
|
|
- Test with multiple tenants before production
|
|
- Monitor tenant-specific metrics
|
|
- Document tenant requirements for new features
|
|
|
|
### DON'T
|
|
|
|
- Hardcode tenant IDs in application code
|
|
- Query without tenant filtering
|
|
- Assume application code enforces isolation
|
|
- Skip index creation after migration
|
|
- Mix tenants in a single transaction
|
|
- Cache results across tenants without keying
|
|
- Forget to pass tenant context to batch operations
|
|
- Assume default values work for production
|
|
|
|
---
|
|
|
|
## Performance Optimization
|
|
|
|
### Index Strategy
|
|
|
|
```python
|
|
# PostgreSQL - Composite index on all three columns
|
|
CREATE INDEX idx_doc_tenant_kb_id
|
|
ON documents(tenant_id, kb_id, id);
|
|
|
|
# For range queries
|
|
CREATE INDEX idx_doc_tenant_kb_created
|
|
ON documents(tenant_id, kb_id, created_at DESC);
|
|
|
|
# MongoDB - Compound index
|
|
db.documents.createIndex({
|
|
tenant_id: 1,
|
|
kb_id: 1,
|
|
_id: 1
|
|
})
|
|
|
|
# For sorting
|
|
db.documents.createIndex({
|
|
tenant_id: 1,
|
|
kb_id: 1,
|
|
created_at: -1
|
|
})
|
|
```
|
|
|
|
### Query Optimization Tips
|
|
|
|
```python
|
|
# Good: Specific tenant filter
|
|
SELECT * FROM documents
|
|
WHERE tenant_id='acme-corp'
|
|
AND kb_id='kb-prod'
|
|
AND status='active'
|
|
ORDER BY created_at DESC;
|
|
|
|
# Bad: Full table scan
|
|
SELECT * FROM documents
|
|
WHERE status='active'
|
|
ORDER BY created_at DESC;
|
|
|
|
# Good: Use indexes
|
|
EXPLAIN SELECT * FROM documents
|
|
WHERE tenant_id='acme-corp'
|
|
AND kb_id='kb-prod'
|
|
AND created_at > NOW() - INTERVAL '7 days';
|
|
|
|
# Result should show: "Index Scan" (not "Seq Scan")
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
### Multi-Tenant Architecture at a Glance
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ Multi-Tenant Architecture Summary │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Goal: Securely isolate data for multiple │
|
|
│ tenants in a single deployment │
|
|
│ │
|
|
│ Method: Database-level filtering by │
|
|
│ (tenant_id, kb_id) │
|
|
│ │
|
|
│ Supported: All 10 storage backends │
|
|
│ │
|
|
│ Activation: Use support modules, pass │
|
|
│ tenant context to every operation │
|
|
│ │
|
|
│ Backward Compatible: Existing code works │
|
|
│ with default values │
|
|
│ │
|
|
│ Secure: Storage layer enforces isolation │
|
|
│ even if application has bugs │
|
|
│ │
|
|
│ Scalable: One instance, unlimited tenants │
|
|
│ │
|
|
└─────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. Review Implementation Examples above for your backend
|
|
2. Run Tests: `pytest tests/test_multi_tenant_backends.py -v`
|
|
3. Plan Migration: Use migration utilities with dry-run first
|
|
4. Deploy: Follow safe migration workflow in Troubleshooting section
|
|
5. Monitor: Watch tenant-specific metrics in production
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- Complete API Reference: See `QUICK_REFERENCE_MULTI_TENANT.md`
|
|
- Deployment Guide: See `PHASE4_COMPLETE_MULTI_TENANT_SUMMARY.md`
|
|
- Architecture Details: See `MULTI_TENANT_COMPLETE_IMPLEMENTATION.md`
|
|
- Code Examples: See support modules in `lightrag/kg/`
|
|
- Test Suite: See `tests/test_multi_tenant_backends.py`
|
|
|
|
---
|
|
|
|
**Status**: Production Ready
|
|
**Last Updated**: November 20, 2025
|
|
**Questions?** Check the Troubleshooting section or review code examples
|