docs: Enterprise Edition & Multi-tenancy attribution (#5 )

* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.

2025-12-04 18:09:15 +08:00

36 KiB

Raw Blame History

Multi-Tenant Architecture

A comprehensive guide to understanding, activating, and implementing multi-tenant support across all storage backends

Last Updated: November 20, 2025
Status: Production Ready
Audience: Developers, DevOps Engineers, System Architects

Overview
Architecture Model
Multi-Tenant Concept
Supported Backends
How It Works
Getting Started
Implementation Examples
Security & Isolation
Migration Guide
Troubleshooting

Overview

LightRAG now supports complete multi-tenant architecture across all 10 storage backends, enabling secure isolation of data for multiple organizations, teams, or customers within a single LightRAG deployment.

Key Benefits

Complete Data Isolation: Database-level filtering prevents cross-tenant access
Easy Activation: Simple configuration with backward compatibility
All Backends Supported: Works with PostgreSQL, MongoDB, Redis, Neo4j, and vector/graph databases
Zero Breaking Changes: Existing code continues to work with defaults
Scale Efficiently: Run one instance for multiple tenants

Real-World Scenario

┌─────────────────────────────────────────────────────────────────┐
│                    Single LightRAG Deployment                   │
│                                                                  │
│  ┌──────────────────────┐      ┌──────────────────────┐        │
│  │   Tenant: Acme Corp  │      │  Tenant: TechStart   │        │
│  │                      │      │                      │        │
│  │  KB: kb-prod    ─────┼─────>│ KB: kb-main   ────┐ │        │
│  │  KB: kb-dev           │      │ KB: kb-staging  │ │        │
│  │                      │      │                  │ │        │
│  └──────────────────────┘      └──────────────────────┘        │
│           │                             │                       │
│           └─────────────┬───────────────┘                       │
│                         │                                       │
│        All data isolated at database level                      │
│                         │                                       │
│          ┌──────────────┴──────────────┐                       │
│          ▼                             ▼                        │
│      ┌─────────────────┐      ┌─────────────────┐             │
│      │   PostgreSQL    │      │     MongoDB     │             │
│      │  (tenant_id+kb) │      │  (tenant_id+kb) │             │
│      └─────────────────┘      └─────────────────┘             │
└─────────────────────────────────────────────────────────────────┘

Architecture Model

Hierarchical Structure

graph TD
    A["Deployment"] --> B["Tenant: Acme Corp"]
    A --> C["Tenant: TechStart"]
    B --> D["KB: kb-prod"]
    B --> E["KB: kb-dev"]
    C --> F["KB: kb-main"]
    C --> G["KB: kb-staging"]
    D --> H["Documents"]
    D --> I["Entities & Relations"]
    D --> J["Vectors"]
    E --> K["Documents"]
    E --> L["Entities & Relations"]
    F --> M["Documents"]
    G --> N["Entities & Relations"]
    
    style A fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20
    style B fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
    style C fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
    style D fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
    style E fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
    style F fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40
    style G fill:#E0F2F1,stroke:#00796B,stroke-width:2px,color:#004D40

Data Model - Composite Key Pattern

Each resource is identified by a composite key: (tenant_id, kb_id, resource_id)

┌────────────────────────────────────────────────────────────┐
│                    Composite Key Pattern                   │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  tenant_id  │   kb_id    │   resource_id   │   data       │
│  ─────────  │  ─────────  │   ─────────     │   ────       │
│  "acme"     │  "kb-prod"  │  "doc-123"      │   {...}      │
│  "acme"     │  "kb-dev"   │  "doc-456"      │   {...}      │
│  "techst"   │  "kb-main"  │  "doc-789"      │   {...}      │
│                                                            │
│  Same resource_id in different tenant/kb = different data │
│  Prevents accidental cross-tenant access                  │
│                                                            │
└────────────────────────────────────────────────────────────┘

Multi-Tenant Concept

Three-Level Isolation

┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Tenant Isolation Levels                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Level 1: TENANT                                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Organization/Customer/Account (highest level)            │  │
│  │ Example: "acme-corp", "techstart-inc"                    │  │
│  │ Isolation: Complete separation between tenants           │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  Level 2: KNOWLEDGE BASE (KB)                                  │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Project/Environment/Domain within tenant                 │  │
│  │ Examples:                                                │  │
│  │   - Acme Corp: kb-prod, kb-dev, kb-staging              │  │
│  │   - TechStart: kb-main, kb-backup                        │  │
│  │ Isolation: Separate data per KB within same tenant       │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  Level 3: RESOURCES                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Documents, Entities, Vectors, Relations (lowest level)   │  │
│  │ Automatically filtered by tenant + kb context            │  │
│  │ Isolation: Only accessible via tenant/kb scope           │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Data Access Pattern

sequenceDiagram
    participant Client as Client Application
    participant API as LightRAG API
    participant TenantCtx as Tenant Context
    participant Storage as Storage Backend

    Client->>API: GET /documents<br/>(tenant: acme-corp, kb: kb-prod)
    API->>TenantCtx: Validate & extract<br/>tenant_id, kb_id
    TenantCtx->>Storage: Query WHERE<br/>tenant_id='acme-corp'<br/>AND kb_id='kb-prod'
    Storage-->>API: Return filtered results
    API-->>Client: Documents (acme-corp only)

    Note over TenantCtx: Even if request<br/>contains tenant_id in URL,<br/>storage layer<br/>enforces isolation

Supported Backends

Complete Backend Coverage

Backend	Isolation Method	Status	Module
PostgreSQL	Column filtering + composite keys	Complete	`postgres_tenant_support.py`
MongoDB	Document field filtering	Complete	`mongo_tenant_support.py`
Redis	Key prefixing (tenant:kb:key)	Complete	`redis_tenant_support.py`
Neo4j	Cypher + node relationships	Complete	`graph_tenant_support.py`
Memgraph	openCypher + properties	Complete	`graph_tenant_support.py`
NetworkX	Subgraph extraction	Complete	`graph_tenant_support.py`
Qdrant	Metadata filtering	Complete	`vector_tenant_support.py`
Milvus	WHERE expression filtering	Complete	`vector_tenant_support.py`
FAISS	Index naming + metadata	Complete	`vector_tenant_support.py`
Nano Vector DB	Document metadata	Complete	`vector_tenant_support.py`

Backend Architecture Diagram

graph TB
    subgraph "Storage Backends"
        Relational["Relational"]
        Document["Document"]
        KV["Key-Value"]
        Vector["Vector"]
        Graph["Graph"]
        
        PG["PostgreSQL"]
        Mongo["MongoDB"]
        Redis["Redis"]
        Qdrant["Qdrant"]
        Milvus["Milvus"]
        FAISS["FAISS"]
        Nano["Nano VDB"]
        Neo4j["Neo4j"]
        Memgraph["Memgraph"]
        NetworkX["NetworkX"]
    end
    
    subgraph "Support Modules"
        PGSupport["postgres_tenant<br/>_support.py"]
        MongoSupport["mongo_tenant<br/>_support.py"]
        RedisSupport["redis_tenant<br/>_support.py"]
        VectorSupport["vector_tenant<br/>_support.py"]
        GraphSupport["graph_tenant<br/>_support.py"]
    end
    
    Relational --> PG
    Document --> Mongo
    KV --> Redis
    Vector --> Qdrant
    Vector --> Milvus
    Vector --> FAISS
    Vector --> Nano
    Graph --> Neo4j
    Graph --> Memgraph
    Graph --> NetworkX
    
    PG -.-> PGSupport
    Mongo -.-> MongoSupport
    Redis -.-> RedisSupport
    Qdrant -.-> VectorSupport
    Milvus -.-> VectorSupport
    FAISS -.-> VectorSupport
    Nano -.-> VectorSupport
    Neo4j -.-> GraphSupport
    Memgraph -.-> GraphSupport
    NetworkX -.-> GraphSupport
    
    style Relational fill:#F1F8E9,stroke:#558B2F,stroke-width:2px,color:#33691E
    style Document fill:#ECE7F3,stroke:#7B1FA2,stroke-width:2px,color:#4A148C
    style KV fill:#E0F2F1,stroke:#00897B,stroke-width:2px,color:#004D40
    style Vector fill:#FFF3E0,stroke:#E65100,stroke-width:2px,color:#BF360C
    style Graph fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px,color:#38006B
    style PGSupport fill:#C8E6C9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20
    style MongoSupport fill:#D8C5E5,stroke:#512DA8,stroke-width:2px,color:#311B92
    style RedisSupport fill:#B2DFDB,stroke:#00695C,stroke-width:2px,color:#004D40
    style VectorSupport fill:#FFD8A8,stroke:#D84315,stroke-width:2px,color:#BF360C
    style GraphSupport fill:#E1BEE7,stroke:#7B1FA2,stroke-width:2px,color:#4A148C

How It Works

Query Execution Flow

┌──────────────────────────────────────────────────────────────┐
│             Typical Query Execution Flow                     │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  1. Client Request                                          │
│     GET /api/documents                                      │
│     Headers: {tenant: "acme-corp", kb: "kb-prod"}          │
│                                                              │
│  2. Extract Tenant Context                                  │
│     tenant_id = extract_from_request(request)              │
│     kb_id = extract_from_request(request)                  │
│                                                              │
│  3. Build Tenant-Aware Query                                │
│     Base Query:                                             │
│       SELECT * FROM documents WHERE status='active'         │
│                                                              │
│     Add Tenant Filter:                                      │
│       SELECT * FROM documents                               │
│       WHERE status='active'                                 │
│       AND tenant_id='acme-corp'                             │
│       AND kb_id='kb-prod'                                   │
│                                                              │
│  4. Execute Query                                           │
│     Storage backend executes filtered query                 │
│                                                              │
│  5. Return Results                                          │
│     Only documents from acme-corp/kb-prod returned          │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Storage Layer Filtering

Each backend has its own filtering mechanism:

┌──────────────────────────────────────────────────────────────┐
│           Backend-Specific Filtering Methods                 │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  PostgreSQL:                                                │
│  WHERE clause + composite PRIMARY KEY                       │
│    (tenant_id, kb_id, id)                                   │
│                                                              │
│  MongoDB:                                                   │
│  Document filter                                            │
│    {tenant_id: "acme-corp", kb_id: "kb-prod"}             │
│                                                              │
│  Redis:                                                     │
│  Key prefix pattern                                         │
│    acme-corp:kb-prod:original_key                          │
│                                                              │
│  Qdrant (Vector DB):                                        │
│  Metadata filter                                            │
│    {"must": [{"key": "tenant_id", ...}, ...]}              │
│                                                              │
│  Neo4j (Graph DB):                                          │
│  Cypher property matching                                   │
│    WHERE node.tenant_id = 'acme-corp'                      │
│    AND node.kb_id = 'kb-prod'                              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Getting Started

Quick Activation

Multi-tenant support is built-in and automatically enabled. Here's how to use it:

Step 1: Import Support Modules

# For PostgreSQL
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder

# For MongoDB  
from lightrag.kg.mongo_tenant_support import MongoTenantHelper

# For Redis
from lightrag.kg.redis_tenant_support import RedisTenantNamespace

# For Vector DBs (Qdrant, Milvus, FAISS, Nano)
from lightrag.kg.vector_tenant_support import QdrantTenantHelper

# For Graph DBs (Neo4j, Memgraph, NetworkX)
from lightrag.kg.graph_tenant_support import Neo4jTenantHelper

Step 2: Use Tenant Context

# Set tenant context for your operation
tenant_id = "acme-corp"
kb_id = "kb-prod"

# All subsequent queries will be automatically scoped to this tenant/kb
# No additional filtering needed in application code!

Step 3: That's It!

All database operations are automatically isolated. No breaking changes to existing code.

Configuration

Minimal configuration needed. If using environment variables:

# Optional: Set default tenant for single-tenant scenarios
export LIGHTRAG_DEFAULT_TENANT="default"
export LIGHTRAG_DEFAULT_KB="default"

# Or use at runtime
context = TenantContext(tenant_id="acme-corp", kb_id="kb-prod")

Implementation Examples

PostgreSQL Example

from lightrag.kg.postgres_tenant_support import TenantSQLBuilder

# Build a tenant-aware query
sql = "SELECT * FROM LIGHTRAG_DOC_FULL WHERE status = :status"
params = {"status": "active"}

# Add tenant filtering
filtered_sql, filtered_params = TenantSQLBuilder.build_filtered_query(
    base_query=sql,
    tenant_id="acme-corp",
    kb_id="kb-prod",
    additional_params=[params]
)

# Execute
result = await db.query(filtered_sql, filtered_params)
# Result: Only documents from acme-corp/kb-prod with status=active

MongoDB Example

from lightrag.kg.mongo_tenant_support import MongoTenantHelper

# Build tenant-aware filter
tenant_filter = MongoTenantHelper.get_tenant_filter(
    tenant_id="acme-corp",
    kb_id="kb-prod",
    additional_filter={"status": "active"}
)

# Use in query
document = await collection.find_one(tenant_filter)
# Result: Only returns documents from acme-corp/kb-prod

Redis Example

from lightrag.kg.redis_tenant_support import RedisTenantNamespace

# Create a tenant-scoped namespace
ns = RedisTenantNamespace(
    redis_client=redis,
    tenant_id="acme-corp",
    kb_id="kb-prod"
)

# All operations are automatically tenant-scoped
value = await ns.get("user:123")
await ns.set("user:123", json_data)
await ns.delete("user:123")

# Key stored as: "acme-corp:kb-prod:user:123"
# No tenant/kb prefix needed in application code

Vector DB (Qdrant) Example

from lightrag.kg.vector_tenant_support import QdrantTenantHelper

# Build tenant filter
tenant_filter = QdrantTenantHelper.build_qdrant_filter(
    tenant_id="acme-corp",
    kb_id="kb-prod"
)

# Search with automatic tenant isolation
results = await qdrant.search(
    collection_name="embeddings",
    query_vector=query_embedding,
    query_filter=tenant_filter,  # Automatic isolation
    limit=10
)
# Result: Only vectors from acme-corp/kb-prod

Graph DB (Neo4j) Example

from lightrag.kg.graph_tenant_support import Neo4jTenantHelper

helper = Neo4jTenantHelper()

# Build tenant-aware Cypher query
base_query = "MATCH (n:Entity) RETURN n"
query, params = helper.build_tenant_aware_query(
    base_query=base_query,
    tenant_id="acme-corp",
    kb_id="kb-prod",
    node_var="n"
)

# Execute
result = await session.run(query, params)
# Result: Only entities from acme-corp/kb-prod

Complete Application Example

from fastapi import FastAPI, Header
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder

app = FastAPI()

@app.get("/documents")
async def get_documents(
    tenant_id: str = Header(...),
    kb_id: str = Header(...),
    db = Depends(get_db)
):
    """Get documents for a specific tenant/kb"""
    
    # Build tenant-scoped query
    query = "SELECT id, title, content FROM documents"
    
    filtered_sql, params = TenantSQLBuilder.build_filtered_query(
        base_query=query,
        tenant_id=tenant_id,
        kb_id=kb_id,
        additional_params=[]
    )
    
    # Execute (tenant context enforced at storage layer)
    documents = await db.query(filtered_sql, params)
    
    return {
        "tenant": tenant_id,
        "kb": kb_id,
        "documents": documents,
        "count": len(documents)
    }


@app.post("/documents/{doc_id}")
async def add_document(
    doc_id: str,
    tenant_id: str = Header(...),
    kb_id: str = Header(...),
    content: dict,
    db = Depends(get_db)
):
    """Add a document for a specific tenant/kb"""
    
    # Composite key: (tenant_id, kb_id, doc_id)
    query = """
        INSERT INTO documents (tenant_id, kb_id, id, content)
        VALUES (:tenant_id, :kb_id, :id, :content)
    """
    
    result = await db.execute(query, {
        "tenant_id": tenant_id,
        "kb_id": kb_id,
        "id": doc_id,
        "content": content
    })
    
    return {
        "status": "created",
        "tenant": tenant_id,
        "kb": kb_id,
        "doc_id": doc_id
    }

Security & Isolation

Isolation Guarantees

┌─────────────────────────────────────────────────────────────┐
│          Multi-Tenant Isolation Guarantees                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Database-Level Enforcement                               │
│  Every query includes (tenant_id, kb_id) filtering         │
│  Impossible to retrieve data from other tenants            │
│                                                             │
│  Composite Key Constraints                                 │
│  PRIMARY KEY (tenant_id, kb_id, id)                        │
│  Prevents accidental ID collisions between tenants         │
│                                                             │
│  No Application-Level Trust                                │
│  Even if app code has bugs, storage layer enforces        │
│  Tenant isolation is deterministic, not probabilistic      │
│                                                             │
│  Migration Safety                                           │
│  Legacy single-tenant data maps to default tenant          │
│  Gradual migration path without data loss                  │
│                                                             │
│  Audit Trail                                                │
│  All operations include tenant context                     │
│  Easy to track which tenant accessed what                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Security Checklist

# DO: Always include tenant context
@app.get("/documents")
async def get_docs(tenant_id: str = Header(...), kb_id: str = Header(...)):
    query = TenantSQLBuilder.build_filtered_query(
        query, tenant_id, kb_id
    )
    return await db.query(query)

# DON'T: Query without tenant filtering
@app.get("/documents")  # WRONG - no tenant context
async def get_docs():
    return await db.query("SELECT * FROM documents")

# DO: Validate tenant context early
async def validate_tenant_access(tenant_id, user_tenant):
    if tenant_id != user_tenant:
        raise PermissionError(f"Cannot access {tenant_id}")

# DO: Use composite keys consistently
key = f"{tenant_id}:{kb_id}:{resource_id}"

# DON'T: Use resource IDs without tenant prefix
key = f"doc:{resource_id}"  # WRONG - can collide with other tenants

Migration Guide

Migrating Existing Single-Tenant Data

Multi-tenant support includes automatic migration utilities for each backend.

PostgreSQL Migration

from lightrag.kg.postgres_tenant_support import add_tenant_columns_migration

# Run one-time migration
await add_tenant_columns_migration(
    db=database_connection,
    default_tenant_id="default",
    default_kb_id="default"
)

# What it does:
# 1. Adds tenant_id and kb_id columns to all tables
# 2. Sets existing rows to default values
# 3. Creates composite indexes for performance
# 4. Updates PRIMARY KEY constraints

MongoDB Migration

from lightrag.kg.mongo_tenant_support import add_tenant_fields_to_collection

# Run migration on each collection
await add_tenant_fields_to_collection(
    collection=mongodb_collection,
    default_tenant_id="default",
    default_kb_id="default"
)

# Creates indexes:
# {tenant_id: 1, kb_id: 1, _id: 1}

Redis Migration (with Dry-Run)

from lightrag.kg.redis_tenant_support import migrate_redis_to_tenant

# Test migration first (dry-run)
stats = await migrate_redis_to_tenant(
    redis_client=redis,
    old_key_pattern="user:*",
    default_tenant_id="default",
    default_kb_id="default",
    dry_run=True  # Preview only
)

print(f"Will migrate: {stats['migrated']} keys")
print(f"Will skip: {stats['skipped']} keys")
print(f"Failed: {stats['failed']} keys")

# Run actual migration
stats = await migrate_redis_to_tenant(
    redis_client=redis,
    old_key_pattern="user:*",
    default_tenant_id="default",
    default_kb_id="default",
    dry_run=False  # Apply changes
)

Migration Workflow

┌──────────────────────────────────────────────────────┐
│         Safe Migration Process                       │
├──────────────────────────────────────────────────────┤
│                                                      │
│  1. BACKUP                                          │
│     - Create database snapshots                     │
│     - Export critical data                          │
│                                                      │
│  2. TEST ENVIRONMENT                                │
│     - Restore backup to test DB                     │
│     - Run migration with dry-run                    │
│     - Verify statistics match expectations          │
│                                                      │
│  3. PRODUCTION STAGING                              │
│     - Run migration on staging with dry-run         │
│     - Test application with new schema              │
│     - Monitor performance                           │
│                                                      │
│  4. PRODUCTION EXECUTION                            │
│     - Schedule maintenance window                   │
│     - Stop application                              │
│     - Run actual migration (dry_run=False)          │
│     - Verify data integrity                         │
│     - Restart application                           │
│                                                      │
│  5. VALIDATION                                      │
│     - Run integration tests                         │
│     - Check application logs                        │
│     - Verify tenant isolation                       │
│     - Monitor for 24 hours                          │
│                                                      │
└──────────────────────────────────────────────────────┘

Troubleshooting

Common Issues & Solutions

Issue 1: No tenant context found

# Problem
async def get_documents(db):
    result = await db.query("SELECT * FROM documents")
    # Error: No tenant context provided

# Solution
async def get_documents(db, tenant_id: str = Header(...)):
    from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
    
    query = "SELECT * FROM documents"
    filtered_sql, params = TenantSQLBuilder.build_filtered_query(
        query, tenant_id, "kb-prod"
    )
    result = await db.query(filtered_sql, params)

Issue 2: Cross-tenant data visible

# Problem
filter_dict = {"status": "active"}  # Missing tenant fields!
result = await collection.find(filter_dict)

# Solution
from lightrag.kg.mongo_tenant_support import MongoTenantHelper

filter_dict = MongoTenantHelper.get_tenant_filter(
    "acme-corp", "kb-prod",
    additional_filter={"status": "active"}
)
result = await collection.find(filter_dict)

Issue 3: Performance degradation after migration

# Solution: Ensure indexes exist
from lightrag.kg.postgres_tenant_support import get_tenant_indexes

# Get recommended indexes
indexes = get_tenant_indexes()

# Create in PostgreSQL
for index_sql in indexes:
    await db.execute(index_sql)

# Verify
ANALYZE documents;  -- Update statistics
EXPLAIN SELECT * FROM documents 
    WHERE tenant_id='acme-corp' 
    AND kb_id='kb-prod';  -- Check query plan

Issue 4: Backward compatibility broken

# Solution: Use default tenant values
context = TenantContext(
    tenant_id="default",  # Default for legacy code
    kb_id="default"
)

# Legacy code continues to work
result = await db.query(legacy_query)  # Uses default context

Debugging Multi-Tenant Issues

# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Add tenant context to logs
import contextvars

tenant_context = contextvars.ContextVar(
    'tenant_context',
    default={'tenant_id': 'unknown', 'kb_id': 'unknown'}
)

# In middleware
def set_tenant_context(tenant_id, kb_id):
    tenant_context.set({'tenant_id': tenant_id, 'kb_id': kb_id})

# In logging
class TenantFilter(logging.Filter):
    def filter(self, record):
        ctx = tenant_context.get()
        record.tenant = ctx['tenant_id']
        record.kb = ctx['kb_id']
        return True

handler = logging.StreamHandler()
handler.addFilter(TenantFilter())
logging.getLogger().addHandler(handler)

# Logs will show:
# 2025-11-20 10:30:45 [acme-corp:kb-prod] SELECT from documents

Best Practices

DO

Always pass tenant context to every operation
Use support module helpers (don't build queries manually)
Create composite indexes on (tenant_id, kb_id, ...)
Validate tenant context early in request pipeline
Log all tenant-related operations
Test with multiple tenants before production
Monitor tenant-specific metrics
Document tenant requirements for new features

DON'T

Hardcode tenant IDs in application code
Query without tenant filtering
Assume application code enforces isolation
Skip index creation after migration
Mix tenants in a single transaction
Cache results across tenants without keying
Forget to pass tenant context to batch operations
Assume default values work for production

Performance Optimization

Index Strategy

# PostgreSQL - Composite index on all three columns
CREATE INDEX idx_doc_tenant_kb_id 
ON documents(tenant_id, kb_id, id);

# For range queries
CREATE INDEX idx_doc_tenant_kb_created
ON documents(tenant_id, kb_id, created_at DESC);

# MongoDB - Compound index
db.documents.createIndex({
    tenant_id: 1,
    kb_id: 1,
    _id: 1
})

# For sorting
db.documents.createIndex({
    tenant_id: 1,
    kb_id: 1,
    created_at: -1
})

Query Optimization Tips

# Good: Specific tenant filter
SELECT * FROM documents 
WHERE tenant_id='acme-corp' 
AND kb_id='kb-prod'
AND status='active'
ORDER BY created_at DESC;

# Bad: Full table scan
SELECT * FROM documents 
WHERE status='active'
ORDER BY created_at DESC;

# Good: Use indexes
EXPLAIN SELECT * FROM documents 
WHERE tenant_id='acme-corp' 
AND kb_id='kb-prod' 
AND created_at > NOW() - INTERVAL '7 days';

# Result should show: "Index Scan" (not "Seq Scan")

Summary