LightRAG/docs/archives/0001-multi-tenant-architecture.md
Raphael MANSUY 2b292d4924
docs: Enterprise Edition & Multi-tenancy attribution (#5)
* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
2025-12-04 18:09:15 +08:00

973 lines
36 KiB
Markdown

# Multi-Tenant Architecture
> A comprehensive guide to understanding, activating, and implementing multi-tenant support across all storage backends
**Last Updated**: November 20, 2025
**Status**: Production Ready
**Audience**: Developers, DevOps Engineers, System Architects
---
## Table of Contents
1. [Overview](#overview)
2. [Architecture Model](#architecture-model)
3. [Multi-Tenant Concept](#multi-tenant-concept)
4. [Supported Backends](#supported-backends)
5. [How It Works](#how-it-works)
6. [Getting Started](#getting-started)
7. [Implementation Examples](#implementation-examples)
8. [Security & Isolation](#security--isolation)
9. [Migration Guide](#migration-guide)
10. [Troubleshooting](#troubleshooting)
---
## Overview
LightRAG now supports complete **multi-tenant architecture** across all 10 storage backends, enabling secure isolation of data for multiple organizations, teams, or customers within a single LightRAG deployment.
### Key Benefits
- Complete Data Isolation: Database-level filtering prevents cross-tenant access
- Easy Activation: Simple configuration with backward compatibility
- All Backends Supported: Works with PostgreSQL, MongoDB, Redis, Neo4j, and vector/graph databases
- Zero Breaking Changes: Existing code continues to work with defaults
- Scale Efficiently: Run one instance for multiple tenants
### Real-World Scenario
```
┌─────────────────────────────────────────────────────────────────┐
│ Single LightRAG Deployment │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ Tenant: Acme Corp │ │ Tenant: TechStart │ │
│ │ │ │ │ │
│ │ KB: kb-prod ─────┼─────>│ KB: kb-main ────┐ │ │
│ │ KB: kb-dev │ │ KB: kb-staging │ │ │
│ │ │ │ │ │ │
│ └──────────────────────┘ └──────────────────────┘ │
│ │ │ │
│ └─────────────┬───────────────┘ │
│ │ │
│ All data isolated at database level │
│ │ │
│ ┌──────────────┴──────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ PostgreSQL │ │ MongoDB │ │
│ │ (tenant_id+kb) │ │ (tenant_id+kb) │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
---
## Architecture Model
### Hierarchical Structure
```mermaid
graph TD
A["Deployment"] --> B["Tenant: Acme Corp"]
A --> C["Tenant: TechStart"]
B --> D["KB: kb-prod"]
B --> E["KB: kb-dev"]
C --> F["KB: kb-main"]
C --> G["KB: kb-staging"]
D --> H["Documents"]
D --> I["Entities & Relations"]
D --> J["Vectors"]
E --> K["Documents"]
E --> L["Entities & Relations"]
F --> M["Documents"]
G --> N["Entities & Relations"]
style A fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20
style B fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
style C fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
style D fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
style E fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
style F fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
style G fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
```
### Data Model - Composite Key Pattern
Each resource is identified by a **composite key**: `(tenant_id, kb_id, resource_id)`
```
┌────────────────────────────────────────────────────────────┐
│ Composite Key Pattern │
├────────────────────────────────────────────────────────────┤
│ │
│ tenant_id │ kb_id │ resource_id │ data │
│ ───────── │ ───────── │ ───────── │ ──── │
│ "acme" │ "kb-prod" │ "doc-123" │ {...} │
│ "acme" │ "kb-dev" │ "doc-456" │ {...} │
│ "techst" │ "kb-main" │ "doc-789" │ {...} │
│ │
│ Same resource_id in different tenant/kb = different data │
│ Prevents accidental cross-tenant access │
│ │
└────────────────────────────────────────────────────────────┘
```
---
## Multi-Tenant Concept
### Three-Level Isolation
```
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Tenant Isolation Levels │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Level 1: TENANT │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Organization/Customer/Account (highest level) │ │
│ │ Example: "acme-corp", "techstart-inc" │ │
│ │ Isolation: Complete separation between tenants │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Level 2: KNOWLEDGE BASE (KB) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Project/Environment/Domain within tenant │ │
│ │ Examples: │ │
│ │ - Acme Corp: kb-prod, kb-dev, kb-staging │ │
│ │ - TechStart: kb-main, kb-backup │ │
│ │ Isolation: Separate data per KB within same tenant │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Level 3: RESOURCES │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Documents, Entities, Vectors, Relations (lowest level) │ │
│ │ Automatically filtered by tenant + kb context │ │
│ │ Isolation: Only accessible via tenant/kb scope │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Data Access Pattern
```mermaid
sequenceDiagram
participant Client as Client Application
participant API as LightRAG API
participant TenantCtx as Tenant Context
participant Storage as Storage Backend
Client->>API: GET /documents<br/>(tenant: acme-corp, kb: kb-prod)
API->>TenantCtx: Validate & extract<br/>tenant_id, kb_id
TenantCtx->>Storage: Query WHERE<br/>tenant_id='acme-corp'<br/>AND kb_id='kb-prod'
Storage-->>API: Return filtered results
API-->>Client: Documents (acme-corp only)
Note over TenantCtx: Even if request<br/>contains tenant_id in URL,<br/>storage layer<br/>enforces isolation
```
---
## Supported Backends
### Complete Backend Coverage
| Backend | Isolation Method | Status | Module |
|---------|---|---|---|
| **PostgreSQL** | Column filtering + composite keys | Complete | `postgres_tenant_support.py` |
| **MongoDB** | Document field filtering | Complete | `mongo_tenant_support.py` |
| **Redis** | Key prefixing (tenant:kb:key) | Complete | `redis_tenant_support.py` |
| **Neo4j** | Cypher + node relationships | Complete | `graph_tenant_support.py` |
| **Memgraph** | openCypher + properties | Complete | `graph_tenant_support.py` |
| **NetworkX** | Subgraph extraction | Complete | `graph_tenant_support.py` |
| **Qdrant** | Metadata filtering | Complete | `vector_tenant_support.py` |
| **Milvus** | WHERE expression filtering | Complete | `vector_tenant_support.py` |
| **FAISS** | Index naming + metadata | Complete | `vector_tenant_support.py` |
| **Nano Vector DB** | Document metadata | Complete | `vector_tenant_support.py` |
### Backend Architecture Diagram
```mermaid
graph TB
subgraph "Storage Backends"
Relational["Relational"]
Document["Document"]
KV["Key-Value"]
Vector["Vector"]
Graph["Graph"]
PG["PostgreSQL"]
Mongo["MongoDB"]
Redis["Redis"]
Qdrant["Qdrant"]
Milvus["Milvus"]
FAISS["FAISS"]
Nano["Nano VDB"]
Neo4j["Neo4j"]
Memgraph["Memgraph"]
NetworkX["NetworkX"]
end
subgraph "Support Modules"
PGSupport["postgres_tenant<br/>_support.py"]
MongoSupport["mongo_tenant<br/>_support.py"]
RedisSupport["redis_tenant<br/>_support.py"]
VectorSupport["vector_tenant<br/>_support.py"]
GraphSupport["graph_tenant<br/>_support.py"]
end
Relational --> PG
Document --> Mongo
KV --> Redis
Vector --> Qdrant
Vector --> Milvus
Vector --> FAISS
Vector --> Nano
Graph --> Neo4j
Graph --> Memgraph
Graph --> NetworkX
PG -.-> PGSupport
Mongo -.-> MongoSupport
Redis -.-> RedisSupport
Qdrant -.-> VectorSupport
Milvus -.-> VectorSupport
FAISS -.-> VectorSupport
Nano -.-> VectorSupport
Neo4j -.-> GraphSupport
Memgraph -.-> GraphSupport
NetworkX -.-> GraphSupport
style Relational fill:#F1F8E9,stroke:#558B2F,stroke-width:2px,color:#33691E
style Document fill:#ECE7F3,stroke:#7B1FA2,stroke-width:2px,color:#4A148C
style KV fill:#E0F2F1,stroke:#00897B,stroke-width:2px,color:#004D40
style Vector fill:#FFF3E0,stroke:#E65100,stroke-width:2px,color:#BF360C
style Graph fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
style PGSupport fill:#C8E6C9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20
style MongoSupport fill:#D8C5E5,stroke:#512DA8,stroke-width:2px,color:#311B92
style RedisSupport fill:#B2DFDB,stroke:#00695C,stroke-width:2px,color:#004D40
style VectorSupport fill:#FFD8A8,stroke:#D84315,stroke-width:2px,color:#BF360C
style GraphSupport fill:#E1BEE7,stroke:#7B1FA2,stroke-width:2px,color:#4A148C
```
---
## How It Works
### Query Execution Flow
```
┌──────────────────────────────────────────────────────────────┐
│ Typical Query Execution Flow │
├──────────────────────────────────────────────────────────────┤
│ │
│ 1. Client Request │
│ GET /api/documents │
│ Headers: {tenant: "acme-corp", kb: "kb-prod"} │
│ │
│ 2. Extract Tenant Context │
│ tenant_id = extract_from_request(request) │
│ kb_id = extract_from_request(request) │
│ │
│ 3. Build Tenant-Aware Query │
│ Base Query: │
│ SELECT * FROM documents WHERE status='active' │
│ │
│ Add Tenant Filter: │
│ SELECT * FROM documents │
│ WHERE status='active' │
│ AND tenant_id='acme-corp' │
│ AND kb_id='kb-prod' │
│ │
│ 4. Execute Query │
│ Storage backend executes filtered query │
│ │
│ 5. Return Results │
│ Only documents from acme-corp/kb-prod returned │
│ │
└──────────────────────────────────────────────────────────────┘
```
### Storage Layer Filtering
Each backend has its own filtering mechanism:
```
┌──────────────────────────────────────────────────────────────┐
│ Backend-Specific Filtering Methods │
├──────────────────────────────────────────────────────────────┤
│ │
│ PostgreSQL: │
│ WHERE clause + composite PRIMARY KEY │
│ (tenant_id, kb_id, id) │
│ │
│ MongoDB: │
│ Document filter │
│ {tenant_id: "acme-corp", kb_id: "kb-prod"} │
│ │
│ Redis: │
│ Key prefix pattern │
│ acme-corp:kb-prod:original_key │
│ │
│ Qdrant (Vector DB): │
│ Metadata filter │
│ {"must": [{"key": "tenant_id", ...}, ...]} │
│ │
│ Neo4j (Graph DB): │
│ Cypher property matching │
│ WHERE node.tenant_id = 'acme-corp' │
│ AND node.kb_id = 'kb-prod' │
│ │
└──────────────────────────────────────────────────────────────┘
```
---
## Getting Started
### Quick Activation
Multi-tenant support is **built-in** and **automatically enabled**. Here's how to use it:
#### Step 1: Import Support Modules
```python
# For PostgreSQL
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
# For MongoDB
from lightrag.kg.mongo_tenant_support import MongoTenantHelper
# For Redis
from lightrag.kg.redis_tenant_support import RedisTenantNamespace
# For Vector DBs (Qdrant, Milvus, FAISS, Nano)
from lightrag.kg.vector_tenant_support import QdrantTenantHelper
# For Graph DBs (Neo4j, Memgraph, NetworkX)
from lightrag.kg.graph_tenant_support import Neo4jTenantHelper
```
#### Step 2: Use Tenant Context
```python
# Set tenant context for your operation
tenant_id = "acme-corp"
kb_id = "kb-prod"
# All subsequent queries will be automatically scoped to this tenant/kb
# No additional filtering needed in application code!
```
#### Step 3: That's It!
All database operations are automatically isolated. No breaking changes to existing code.
### Configuration
Minimal configuration needed. If using environment variables:
```bash
# Optional: Set default tenant for single-tenant scenarios
export LIGHTRAG_DEFAULT_TENANT="default"
export LIGHTRAG_DEFAULT_KB="default"
# Or use at runtime
context = TenantContext(tenant_id="acme-corp", kb_id="kb-prod")
```
---
## Implementation Examples
### PostgreSQL Example
```python
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
# Build a tenant-aware query
sql = "SELECT * FROM LIGHTRAG_DOC_FULL WHERE status = :status"
params = {"status": "active"}
# Add tenant filtering
filtered_sql, filtered_params = TenantSQLBuilder.build_filtered_query(
base_query=sql,
tenant_id="acme-corp",
kb_id="kb-prod",
additional_params=[params]
)
# Execute
result = await db.query(filtered_sql, filtered_params)
# Result: Only documents from acme-corp/kb-prod with status=active
```
### MongoDB Example
```python
from lightrag.kg.mongo_tenant_support import MongoTenantHelper
# Build tenant-aware filter
tenant_filter = MongoTenantHelper.get_tenant_filter(
tenant_id="acme-corp",
kb_id="kb-prod",
additional_filter={"status": "active"}
)
# Use in query
document = await collection.find_one(tenant_filter)
# Result: Only returns documents from acme-corp/kb-prod
```
### Redis Example
```python
from lightrag.kg.redis_tenant_support import RedisTenantNamespace
# Create a tenant-scoped namespace
ns = RedisTenantNamespace(
redis_client=redis,
tenant_id="acme-corp",
kb_id="kb-prod"
)
# All operations are automatically tenant-scoped
value = await ns.get("user:123")
await ns.set("user:123", json_data)
await ns.delete("user:123")
# Key stored as: "acme-corp:kb-prod:user:123"
# No tenant/kb prefix needed in application code
```
### Vector DB (Qdrant) Example
```python
from lightrag.kg.vector_tenant_support import QdrantTenantHelper
# Build tenant filter
tenant_filter = QdrantTenantHelper.build_qdrant_filter(
tenant_id="acme-corp",
kb_id="kb-prod"
)
# Search with automatic tenant isolation
results = await qdrant.search(
collection_name="embeddings",
query_vector=query_embedding,
query_filter=tenant_filter, # Automatic isolation
limit=10
)
# Result: Only vectors from acme-corp/kb-prod
```
### Graph DB (Neo4j) Example
```python
from lightrag.kg.graph_tenant_support import Neo4jTenantHelper
helper = Neo4jTenantHelper()
# Build tenant-aware Cypher query
base_query = "MATCH (n:Entity) RETURN n"
query, params = helper.build_tenant_aware_query(
base_query=base_query,
tenant_id="acme-corp",
kb_id="kb-prod",
node_var="n"
)
# Execute
result = await session.run(query, params)
# Result: Only entities from acme-corp/kb-prod
```
### Complete Application Example
```python
from fastapi import FastAPI, Header
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
app = FastAPI()
@app.get("/documents")
async def get_documents(
tenant_id: str = Header(...),
kb_id: str = Header(...),
db = Depends(get_db)
):
"""Get documents for a specific tenant/kb"""
# Build tenant-scoped query
query = "SELECT id, title, content FROM documents"
filtered_sql, params = TenantSQLBuilder.build_filtered_query(
base_query=query,
tenant_id=tenant_id,
kb_id=kb_id,
additional_params=[]
)
# Execute (tenant context enforced at storage layer)
documents = await db.query(filtered_sql, params)
return {
"tenant": tenant_id,
"kb": kb_id,
"documents": documents,
"count": len(documents)
}
@app.post("/documents/{doc_id}")
async def add_document(
doc_id: str,
tenant_id: str = Header(...),
kb_id: str = Header(...),
content: dict,
db = Depends(get_db)
):
"""Add a document for a specific tenant/kb"""
# Composite key: (tenant_id, kb_id, doc_id)
query = """
INSERT INTO documents (tenant_id, kb_id, id, content)
VALUES (:tenant_id, :kb_id, :id, :content)
"""
result = await db.execute(query, {
"tenant_id": tenant_id,
"kb_id": kb_id,
"id": doc_id,
"content": content
})
return {
"status": "created",
"tenant": tenant_id,
"kb": kb_id,
"doc_id": doc_id
}
```
---
## Security & Isolation
### Isolation Guarantees
```
┌─────────────────────────────────────────────────────────────┐
│ Multi-Tenant Isolation Guarantees │
├─────────────────────────────────────────────────────────────┤
│ │
│ Database-Level Enforcement │
│ Every query includes (tenant_id, kb_id) filtering │
│ Impossible to retrieve data from other tenants │
│ │
│ Composite Key Constraints │
│ PRIMARY KEY (tenant_id, kb_id, id) │
│ Prevents accidental ID collisions between tenants │
│ │
│ No Application-Level Trust │
│ Even if app code has bugs, storage layer enforces │
│ Tenant isolation is deterministic, not probabilistic │
│ │
│ Migration Safety │
│ Legacy single-tenant data maps to default tenant │
│ Gradual migration path without data loss │
│ │
│ Audit Trail │
│ All operations include tenant context │
│ Easy to track which tenant accessed what │
│ │
└─────────────────────────────────────────────────────────────┘
```
### Security Checklist
```python
# DO: Always include tenant context
@app.get("/documents")
async def get_docs(tenant_id: str = Header(...), kb_id: str = Header(...)):
query = TenantSQLBuilder.build_filtered_query(
query, tenant_id, kb_id
)
return await db.query(query)
# DON'T: Query without tenant filtering
@app.get("/documents") # WRONG - no tenant context
async def get_docs():
return await db.query("SELECT * FROM documents")
# DO: Validate tenant context early
async def validate_tenant_access(tenant_id, user_tenant):
if tenant_id != user_tenant:
raise PermissionError(f"Cannot access {tenant_id}")
# DO: Use composite keys consistently
key = f"{tenant_id}:{kb_id}:{resource_id}"
# DON'T: Use resource IDs without tenant prefix
key = f"doc:{resource_id}" # WRONG - can collide with other tenants
```
---
## Migration Guide
### Migrating Existing Single-Tenant Data
Multi-tenant support includes automatic migration utilities for each backend.
#### PostgreSQL Migration
```python
from lightrag.kg.postgres_tenant_support import add_tenant_columns_migration
# Run one-time migration
await add_tenant_columns_migration(
db=database_connection,
default_tenant_id="default",
default_kb_id="default"
)
# What it does:
# 1. Adds tenant_id and kb_id columns to all tables
# 2. Sets existing rows to default values
# 3. Creates composite indexes for performance
# 4. Updates PRIMARY KEY constraints
```
#### MongoDB Migration
```python
from lightrag.kg.mongo_tenant_support import add_tenant_fields_to_collection
# Run migration on each collection
await add_tenant_fields_to_collection(
collection=mongodb_collection,
default_tenant_id="default",
default_kb_id="default"
)
# Creates indexes:
# {tenant_id: 1, kb_id: 1, _id: 1}
```
#### Redis Migration (with Dry-Run)
```python
from lightrag.kg.redis_tenant_support import migrate_redis_to_tenant
# Test migration first (dry-run)
stats = await migrate_redis_to_tenant(
redis_client=redis,
old_key_pattern="user:*",
default_tenant_id="default",
default_kb_id="default",
dry_run=True # Preview only
)
print(f"Will migrate: {stats['migrated']} keys")
print(f"Will skip: {stats['skipped']} keys")
print(f"Failed: {stats['failed']} keys")
# Run actual migration
stats = await migrate_redis_to_tenant(
redis_client=redis,
old_key_pattern="user:*",
default_tenant_id="default",
default_kb_id="default",
dry_run=False # Apply changes
)
```
### Migration Workflow
```
┌──────────────────────────────────────────────────────┐
│ Safe Migration Process │
├──────────────────────────────────────────────────────┤
│ │
│ 1. BACKUP │
│ - Create database snapshots │
│ - Export critical data │
│ │
│ 2. TEST ENVIRONMENT │
│ - Restore backup to test DB │
│ - Run migration with dry-run │
│ - Verify statistics match expectations │
│ │
│ 3. PRODUCTION STAGING │
│ - Run migration on staging with dry-run │
│ - Test application with new schema │
│ - Monitor performance │
│ │
│ 4. PRODUCTION EXECUTION │
│ - Schedule maintenance window │
│ - Stop application │
│ - Run actual migration (dry_run=False) │
│ - Verify data integrity │
│ - Restart application │
│ │
│ 5. VALIDATION │
│ - Run integration tests │
│ - Check application logs │
│ - Verify tenant isolation │
│ - Monitor for 24 hours │
│ │
└──────────────────────────────────────────────────────┘
```
---
## Troubleshooting
### Common Issues & Solutions
#### Issue 1: No tenant context found
```python
# Problem
async def get_documents(db):
result = await db.query("SELECT * FROM documents")
# Error: No tenant context provided
# Solution
async def get_documents(db, tenant_id: str = Header(...)):
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
query = "SELECT * FROM documents"
filtered_sql, params = TenantSQLBuilder.build_filtered_query(
query, tenant_id, "kb-prod"
)
result = await db.query(filtered_sql, params)
```
#### Issue 2: Cross-tenant data visible
```python
# Problem
filter_dict = {"status": "active"} # Missing tenant fields!
result = await collection.find(filter_dict)
# Solution
from lightrag.kg.mongo_tenant_support import MongoTenantHelper
filter_dict = MongoTenantHelper.get_tenant_filter(
"acme-corp", "kb-prod",
additional_filter={"status": "active"}
)
result = await collection.find(filter_dict)
```
#### Issue 3: Performance degradation after migration
```python
# Solution: Ensure indexes exist
from lightrag.kg.postgres_tenant_support import get_tenant_indexes
# Get recommended indexes
indexes = get_tenant_indexes()
# Create in PostgreSQL
for index_sql in indexes:
await db.execute(index_sql)
# Verify
ANALYZE documents; -- Update statistics
EXPLAIN SELECT * FROM documents
WHERE tenant_id='acme-corp'
AND kb_id='kb-prod'; -- Check query plan
```
#### Issue 4: Backward compatibility broken
```python
# Solution: Use default tenant values
context = TenantContext(
tenant_id="default", # Default for legacy code
kb_id="default"
)
# Legacy code continues to work
result = await db.query(legacy_query) # Uses default context
```
### Debugging Multi-Tenant Issues
```python
# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Add tenant context to logs
import contextvars
tenant_context = contextvars.ContextVar(
'tenant_context',
default={'tenant_id': 'unknown', 'kb_id': 'unknown'}
)
# In middleware
def set_tenant_context(tenant_id, kb_id):
tenant_context.set({'tenant_id': tenant_id, 'kb_id': kb_id})
# In logging
class TenantFilter(logging.Filter):
def filter(self, record):
ctx = tenant_context.get()
record.tenant = ctx['tenant_id']
record.kb = ctx['kb_id']
return True
handler = logging.StreamHandler()
handler.addFilter(TenantFilter())
logging.getLogger().addHandler(handler)
# Logs will show:
# 2025-11-20 10:30:45 [acme-corp:kb-prod] SELECT from documents
```
---
## Best Practices
### DO
- Always pass tenant context to every operation
- Use support module helpers (don't build queries manually)
- Create composite indexes on (tenant_id, kb_id, ...)
- Validate tenant context early in request pipeline
- Log all tenant-related operations
- Test with multiple tenants before production
- Monitor tenant-specific metrics
- Document tenant requirements for new features
### DON'T
- Hardcode tenant IDs in application code
- Query without tenant filtering
- Assume application code enforces isolation
- Skip index creation after migration
- Mix tenants in a single transaction
- Cache results across tenants without keying
- Forget to pass tenant context to batch operations
- Assume default values work for production
---
## Performance Optimization
### Index Strategy
```python
# PostgreSQL - Composite index on all three columns
CREATE INDEX idx_doc_tenant_kb_id
ON documents(tenant_id, kb_id, id);
# For range queries
CREATE INDEX idx_doc_tenant_kb_created
ON documents(tenant_id, kb_id, created_at DESC);
# MongoDB - Compound index
db.documents.createIndex({
tenant_id: 1,
kb_id: 1,
_id: 1
})
# For sorting
db.documents.createIndex({
tenant_id: 1,
kb_id: 1,
created_at: -1
})
```
### Query Optimization Tips
```python
# Good: Specific tenant filter
SELECT * FROM documents
WHERE tenant_id='acme-corp'
AND kb_id='kb-prod'
AND status='active'
ORDER BY created_at DESC;
# Bad: Full table scan
SELECT * FROM documents
WHERE status='active'
ORDER BY created_at DESC;
# Good: Use indexes
EXPLAIN SELECT * FROM documents
WHERE tenant_id='acme-corp'
AND kb_id='kb-prod'
AND created_at > NOW() - INTERVAL '7 days';
# Result should show: "Index Scan" (not "Seq Scan")
```
---
## Summary
### Multi-Tenant Architecture at a Glance
```
┌─────────────────────────────────────────────────────┐
│ Multi-Tenant Architecture Summary │
├─────────────────────────────────────────────────────┤
│ │
│ Goal: Securely isolate data for multiple │
│ tenants in a single deployment │
│ │
│ Method: Database-level filtering by │
│ (tenant_id, kb_id) │
│ │
│ Supported: All 10 storage backends │
│ │
│ Activation: Use support modules, pass │
│ tenant context to every operation │
│ │
│ Backward Compatible: Existing code works │
│ with default values │
│ │
│ Secure: Storage layer enforces isolation │
│ even if application has bugs │
│ │
│ Scalable: One instance, unlimited tenants │
│ │
└─────────────────────────────────────────────────────┘
```
---
## Next Steps
1. Review Implementation Examples above for your backend
2. Run Tests: `pytest tests/test_multi_tenant_backends.py -v`
3. Plan Migration: Use migration utilities with dry-run first
4. Deploy: Follow safe migration workflow in Troubleshooting section
5. Monitor: Watch tenant-specific metrics in production
---
## Additional Resources
- Complete API Reference: See `QUICK_REFERENCE_MULTI_TENANT.md`
- Deployment Guide: See `PHASE4_COMPLETE_MULTI_TENANT_SUMMARY.md`
- Architecture Details: See `MULTI_TENANT_COMPLETE_IMPLEMENTATION.md`
- Code Examples: See support modules in `lightrag/kg/`
- Test Suite: See `tests/test_multi_tenant_backends.py`
---
**Status**: Production Ready
**Last Updated**: November 20, 2025
**Questions?** Check the Troubleshooting section or review code examples